Layers¶

Deep learning models are often said to be made up of “layers”. Intuitively, a “layer” is a function which transforms some tensor into another tensor. DeepChem maintains an extensive collection of layers which perform various useful scientific transformations. For now, most layers are Keras only but over time we expect this support to expand to other types of models and layers.

Layers Cheatsheet¶

The “layers cheatsheet” lists various scientifically relevant differentiable layers implemented in DeepChem.

Note that some layers implemented for specific model architectures such as GROVER and Attention layers, this is indicated in the Model column of the table.

In order to use the layers, make sure that the backend (Keras and tensorflow, Pytorch or Jax) is installed.

Tensorflow Keras Layers

These layers are subclasses of the tensorflow.keras.layers.Layer class.

Custom Keras Layers¶
Layer	Reference	Model
InteratomicL2Distances
GraphConv	ref
GraphPool	ref
GraphGather	ref
MolGANConvolutionLayer	ref	MolGan
MolGANAggregationLayer	ref	MolGan
MolGANMultiConvolutionLayer	ref	MolGan
MolGANEncoderLayer	ref	MolGan
LSTMStep
AttnLSTMEmbedding	ref
IterRefLSTMEmbedding
SwitchedDropout
WeightedLinearCombo
CombineMeanSt
Stack
VinaFreeEnergy
NeighborList
AtomicConvolution	ref
AlphaShareLayer		Sluice Network
SluiceLoss		Sluice Network
BetaShare		Sluice Network
ANIFeat
GraphEmbedPoolLayer	ref
Highway	ref
WeaveLayer	ref
WeaveGather	ref
DTNNEmbedding
DTNNStep
DTNNGather
DAGLayer	ref
DAGGather
MessagePassing	ref
EdgeNetwork	ref	MessagePassing
GatedRecurrentUnit	ref	MessagePassing
SetGather

PyTorch

These layers are subclasses of the torch.nn.Module class.

Custom PyTorch Layers¶
Layer	Reference	Model
MultilayerPerceptron
ScaleNorm	ref	Molecular Attention Transformer
MATEncoderLayer	ref	Molecular Attention Transformer
MultiHeadedMATAttention	ref	Molecular Attention Transformer
SublayerConnection	ref	Transformer
MATEmbedding	ref	Molecular Attention Transformer
MATGenerator	ref	Molecular Attention Transformer
Affine	ref	Normalizing Flow
RealNVPLayer	ref	Normalizing Flow
DMPNNEncoderLayer	ref	Normalizing Flow
PositionwiseFeedForward	ref	Molecular Attention Transformer
GraphPool	ref
GroverMPNEncoder	ref	Grover
GroverAttentionHead	ref	Grover
GroverMTBlock	ref	Grover
GroverTransEncoder	ref	Grover
GroverEmbedding	ref	Grover
GroverAtomVocabPredictor	ref	Grover
GroverBondVocabPredictor	ref	Grover
GroverFunctionalGroupPredictor	ref	Grover
ScaledDotProductAttention	ref	Transformer
SelfAttention	ref	Transformer
GroverReadout	ref	Grover
DFTXC	ref	XCModel-DFT
NNLDA	ref	XCModel-DFT
HybridXC	ref	XCModel-DFT
XCNNSCF	ref	XCModel-DFT
AtomEncoder	`https://arxiv.org/abs/2110.04126`_	3D InfoMax
BondEncoder	`https://arxiv.org/abs/2110.04126`_	3D InfoMax
Net3DLayer	`https://arxiv.org/abs/2110.04126`_	3D InfoMax
Net3D	`https://arxiv.org/abs/2110.04126`_	3D InfoMax
PNALayer	`https://arxiv.org/abs/2004.05718`_	Principal Neighbourhood Aggregation
PNAGNN	`https://arxiv.org/abs/2004.05718`_	Principal Neighbourhood Aggregation
EdgeNetwork	ref	Message Passing Neural Network
WeaveLayer	ref	WeaveModel
WeaveGather	ref	WeaveModel
GradientPenalty	ref	WGANModel
MolGANConvolutionLayer	ref	MolGan
MolGANAggregationLayer	ref	MolGan
MolGANMultiConvolutionLayer	ref	MolGan
MolGANEncoderLayer	ref	MolGan
DTNNEmbedding	ref`<https://arxiv.org/abs/1609.08259>`_	DTNNModel
DTNNStep	ref`<https://arxiv.org/abs/1609.08259>`_	DTNNModel
DTNNGather	ref`<https://arxiv.org/abs/1609.08259>`_	DTNNModel
MXMNetGlobalMessagePassing	ref	MXMNetModel
MXMNetBesselBasisLayer	ref	MXMNetModel
VariationalRandomizer	ref	SeqToSeqModel
EncoderRNN	ref	SeqToSeqModel
DecoderRNN	ref	SeqToSeqModel
FerminetElectronFeature	ref	FerminetModel
FerminetEnvelope	ref	FerminetModel
MXMNetLocalMessagePassing	ref	MXMNetModel
MXMNetModelMXMNetSphericalBasisLayer	ref`<https://arxiv.org/pdf/2011.07457>`_	MXMNetModel
HighwayLayer	ref
GraphConv	ref
GraphPool	ref
GraphGather	ref
IRVLayer	ref
DAGLayer	ref
DAGGather	ref

Keras Layers¶

class InteratomicL2Distances(*args, **kwargs)[source]¶

Compute (squared) L2 Distances between atoms given neighbors.

This class computes pairwise distances between its inputs.

Examples

>>> import numpy as np
>>> import deepchem as dc
>>> atoms = 5
>>> neighbors = 2
>>> coords = np.random.rand(atoms, 3)
>>> neighbor_list = np.random.randint(0, atoms, size=(atoms, neighbors))
>>> layer = InteratomicL2Distances(atoms, neighbors, 3)
>>> result = np.array(layer([coords, neighbor_list]))
>>> result.shape
(5, 2)

__init__(N_atoms: int, M_nbrs: int, ndim: int, **kwargs)[source]¶

Constructor for this layer.

Parameters:

N_atoms (int) – Number of atoms in the system total.
M_nbrs (int) – Number of neighbors to consider when computing distances.
n_dim (int) – Number of descriptors for each atom.

get_config() → Dict[source]¶: Returns config dictionary for this layer.

call(inputs: List)[source]¶

Invokes this layer.

Parameters:: inputs (list) – Should be of form inputs=[coords, nbr_list] where coords is a tensor of shape (None, N, 3) and nbr_list is a list.
Return type:: Tensor of shape (N_atoms, M_nbrs) with interatomic distances.

class GraphConv(*args, **kwargs)[source]¶

Graph Convolutional Layers

This layer implements the graph convolution introduced in [1]_. The graph convolution combines per-node feature vectures in a nonlinear fashion with the feature vectors for neighboring nodes. This “blends” information in local neighborhoods of a graph.

References

__init__(out_channel: int, min_deg: int = 0, max_deg: int = 10, activation_fn: Callable | None = None, **kwargs)[source]¶

Initialize a graph convolutional layer.

Parameters:

out_channel (int) – The number of output channels per graph node.
min_deg (int, optional (default 0)) – The minimum allowed degree for each graph node.
max_deg (int, optional (default 10)) – The maximum allowed degree for each graph node. Note that this is set to 10 to handle complex molecules (some organometallic compounds have strange structures). If you’re using this for non-molecular applications, you may need to set this much higher depending on your dataset.
activation_fn (function) – A nonlinear activation function to apply. If you’re not sure, tf.nn.relu is probably a good default for your application.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

call(inputs)[source]¶

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:

inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.
- NumPy array or Python scalar values in inputs get cast as tensors.
- Keras mask metadata is only collected from inputs.
- Layers are built (build(input_shape) method) using shape info from inputs only.
- input_spec compatibility is only checked against inputs.
- Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
- The SavedModel input specification is generated using inputs only.
- Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.
- mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

sum_neigh(atoms, deg_adj_lists)[source]¶: Store the summed atoms by degree

class GraphPool(*args, **kwargs)[source]¶

A GraphPool gathers data from local neighborhoods of a graph.

This layer does a max-pooling over the feature vectors of atoms in a neighborhood. You can think of this layer as analogous to a max-pooling layer for 2D convolutions but which operates on graphs instead. This technique is described in [1]_.

References

__init__(min_degree=0, max_degree=10, **kwargs)[source]¶

Initialize this layer

Parameters:

min_deg (int, optional (default 0)) – The minimum allowed degree for each graph node.
max_deg (int, optional (default 10)) – The maximum allowed degree for each graph node. Note that this is set to 10 to handle complex molecules (some organometallic compounds have strange structures). If you’re using this for non-molecular applications, you may need to set this much higher depending on your dataset.

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

call(inputs)[source]¶

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:

inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.
- NumPy array or Python scalar values in inputs get cast as tensors.
- Keras mask metadata is only collected from inputs.
- Layers are built (build(input_shape) method) using shape info from inputs only.
- input_spec compatibility is only checked against inputs.
- Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
- The SavedModel input specification is generated using inputs only.
- Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.
- mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class GraphGather(*args, **kwargs)[source]¶

A GraphGather layer pools node-level feature vectors to create a graph feature vector.

Many graph convolutional networks manipulate feature vectors per graph-node. For a molecule for example, each node might represent an atom, and the network would manipulate atomic feature vectors that summarize the local chemistry of the atom. However, at the end of the application, we will likely want to work with a molecule level feature representation. The GraphGather layer creates a graph level feature vector by combining all the node-level feature vectors.

One subtlety about this layer is that it depends on the batch_size. This is done for internal implementation reasons. The GraphConv, and GraphPool layers pool all nodes from all graphs in a batch that’s being processed. The GraphGather reassembles these jumbled node feature vectors into per-graph feature vectors.

References

__init__(batch_size, activation_fn=None, **kwargs)[source]¶

Initialize this layer.

Parameters:

batch_size (int) – The batch size for this layer. Note that the layer’s behavior changes depending on the batch size.
activation_fn (function) – A nonlinear activation function to apply. If you’re not sure, tf.nn.relu is probably a good default for your application.

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

call(inputs)[source]¶

Invoking this layer.

Parameters:: inputs (list) – This list should consist of inputs = [atom_features, deg_slice, membership, deg_adj_list placeholders…]. These are all tensors that are created/process by GraphConv and GraphPool

class MolGANConvolutionLayer(*args, **kwargs)[source]¶

Graph convolution layer used in MolGAN model. MolGAN is a WGAN type model for generation of small molecules. Not used directly, higher level layers like MolGANMultiConvolutionLayer use it. This layer performs basic convolution on one-hot encoded matrices containing atom and bond information. This layer also accepts three inputs for the case when convolution is performed more than once and results of previous convolution need to used. It was done in such a way to avoid creating another layer that accepts three inputs rather than two. The last input layer is so-called hidden_layer and it hold results of the convolution while first two are unchanged input tensors.

Example

See: MolGANMultiConvolutionLayer for using in layers.

>>> from tensorflow.keras import Model
>>> from tensorflow.keras.layers import Input
>>> vertices = 9
>>> nodes = 5
>>> edges = 5
>>> units = 128

>>> layer1 = MolGANConvolutionLayer(units=units,edges=edges, name='layer1')
>>> layer2 = MolGANConvolutionLayer(units=units,edges=edges, name='layer2')
>>> adjacency_tensor= Input(shape=(vertices, vertices, edges))
>>> node_tensor = Input(shape=(vertices,nodes))
>>> hidden1 = layer1([adjacency_tensor,node_tensor])
>>> output = layer2(hidden1)
>>> model = Model(inputs=[adjacency_tensor,node_tensor], outputs=[output])

References

__init__(units: int, activation: ~typing.Callable = <function tanh>, dropout_rate: float = 0.0, edges: int = 5, name: str = '', **kwargs)[source]¶

Initialize this layer.

Parameters:

units (int) – Dimesion of dense layers used for convolution
activation (function, optional (default=Tanh)) – activation function used across model, default is Tanh
dropout_rate (float, optional (default=0.0)) – Dropout rate used by dropout layer
edges (int, optional (default=5)) – How many dense layers to use in convolution. Typically equal to number of bond types used in the model.
name (string, optional (default="")) – Name of the layer

call(inputs, training=False)[source]¶

Invoke this layer

Parameters:

inputs (list) – List of two input matrices, adjacency tensor and node features tensors in one-hot encoding format.
training (bool) – Should this layer be run in training mode. Typically decided by main model, influences things like dropout.

Returns:

First and second are original input tensors Third is the result of convolution

Return type:

tuple(tf.Tensor,tf.Tensor,tf.Tensor)

get_config() → Dict[source]¶: Returns config dictionary for this layer.

class MolGANAggregationLayer(*args, **kwargs)[source]¶

Graph Aggregation layer used in MolGAN model. MolGAN is a WGAN type model for generation of small molecules. Performs aggregation on tensor resulting from convolution layers. Given its simple nature it might be removed in future and moved to MolGANEncoderLayer.

Example

>>> from tensorflow.keras import Model
>>> from tensorflow.keras.layers import Input
>>> vertices = 9
>>> nodes = 5
>>> edges = 5
>>> units = 128

>>> layer_1 = MolGANConvolutionLayer(units=units,edges=edges, name='layer1')
>>> layer_2 = MolGANConvolutionLayer(units=units,edges=edges, name='layer2')
>>> layer_3 = MolGANAggregationLayer(units=128, name='layer3')
>>> adjacency_tensor= Input(shape=(vertices, vertices, edges))
>>> node_tensor = Input(shape=(vertices,nodes))
>>> hidden_1 = layer_1([adjacency_tensor,node_tensor])
>>> hidden_2 = layer_2(hidden_1)
>>> output = layer_3(hidden_2[2])
>>> model = Model(inputs=[adjacency_tensor,node_tensor], outputs=[output])

References

__init__(units: int = 128, activation: ~typing.Callable = <function tanh>, dropout_rate: float = 0.0, name: str = '', **kwargs)[source]¶

Initialize the layer

Parameters:

units (int, optional (default=128)) – Dimesion of dense layers used for aggregation
activation (function, optional (default=Tanh)) – activation function used across model, default is Tanh
dropout_rate (float, optional (default=0.0)) – Used by dropout layer
name (string, optional (default="")) – Name of the layer

call(inputs, training=False)[source]¶

Invoke this layer

Parameters:

inputs (List) – Single tensor resulting from graph convolution layer
training (bool) – Should this layer be run in training mode. Typically decided by main model, influences things like dropout.

Returns:

aggregation tensor – Result of aggregation function on input convolution tensor.

Return type:

tf.Tensor

get_config() → Dict[source]¶: Returns config dictionary for this layer.

class MolGANMultiConvolutionLayer(*args, **kwargs)[source]¶

Multiple pass convolution layer used in MolGAN model. MolGAN is a WGAN type model for generation of small molecules. It takes outputs of previous convolution layer and uses them as inputs for the next one. It simplifies the overall framework, but might be moved to MolGANEncoderLayer in the future in order to reduce number of layers.

Example

>>> from tensorflow.keras import Model
>>> from tensorflow.keras.layers import Input
>>> vertices = 9
>>> nodes = 5
>>> edges = 5
>>> units = 128

>>> layer_1 = MolGANMultiConvolutionLayer(units=(128,64), name='layer1')
>>> layer_2 = MolGANAggregationLayer(units=128, name='layer2')
>>> adjacency_tensor= Input(shape=(vertices, vertices, edges))
>>> node_tensor = Input(shape=(vertices,nodes))
>>> hidden = layer_1([adjacency_tensor,node_tensor])
>>> output = layer_2(hidden)
>>> model = Model(inputs=[adjacency_tensor,node_tensor], outputs=[output])

References

__init__(units: ~typing.Tuple = (128, 64), activation: ~typing.Callable = <function tanh>, dropout_rate: float = 0.0, edges: int = 5, name: str = '', **kwargs)[source]¶

Initialize the layer

Parameters:

units (Tuple, optional (default=(128,64)), min_length=2) – List of dimensions used by consecutive convolution layers. The more values the more convolution layers invoked.
activation (function, optional (default=tanh)) – activation function used across model, default is Tanh
dropout_rate (float, optional (default=0.0)) – Used by dropout layer
edges (int, optional (default=0)) – Controls how many dense layers use for single convolution unit. Typically matches number of bond types used in the molecule.
name (string, optional (default="")) – Name of the layer

call(inputs, training=False)[source]¶

Invoke this layer

Parameters:

inputs (list) – List of two input matrices, adjacency tensor and node features tensors in one-hot encoding format.
training (bool) – Should this layer be run in training mode. Typically decided by main model, influences things like dropout.

Returns:

convolution tensor – Result of input tensors going through convolution a number of times.

Return type:

tf.Tensor

get_config() → Dict[source]¶: Returns config dictionary for this layer.

class MolGANEncoderLayer(*args, **kwargs)[source]¶

Main learning layer used by MolGAN model. MolGAN is a WGAN type model for generation of small molecules. It role is to further simplify model. This layer can be manually built by stacking graph convolution layers followed by graph aggregation.

Example

>>> from tensorflow.keras import Model
>>> from tensorflow.keras.layers import Input, Dropout,Dense
>>> vertices = 9
>>> edges = 5
>>> nodes = 5
>>> dropout_rate = .0
>>> adjacency_tensor= Input(shape=(vertices, vertices, edges))
>>> node_tensor = Input(shape=(vertices, nodes))

>>> graph = MolGANEncoderLayer(units = [(128,64),128], dropout_rate= dropout_rate, edges=edges)([adjacency_tensor,node_tensor])
>>> dense = Dense(units=128, activation='tanh')(graph)
>>> dense = Dropout(dropout_rate)(dense)
>>> dense = Dense(units=64, activation='tanh')(dense)
>>> dense = Dropout(dropout_rate)(dense)
>>> output = Dense(units=1)(dense)

>>> model = Model(inputs=[adjacency_tensor,node_tensor], outputs=[output])

References

__init__(units: ~typing.List = [(128, 64), 128], activation: ~typing.Callable = <function tanh>, dropout_rate: float = 0.0, edges: int = 5, name: str = '', **kwargs)[source]¶

Initialize the layer.

Parameters:

units (List, optional (default=[(128, 64), 128])) – List of units for MolGANMultiConvolutionLayer and GraphAggregationLayer i.e. [(128,64),128] means two convolution layers dims = [128,64] followed by aggregation layer dims=128
activation (function, optional (default=Tanh)) – activation function used across model, default is Tanh
dropout_rate (float, optional (default=0.0)) – Used by dropout layer
edges (int, optional (default=0)) – Controls how many dense layers use for single convolution unit. Typically matches number of bond types used in the molecule.
name (string, optional (default="")) – Name of the layer

call(inputs, training=False)[source]¶

Invoke this layer

Parameters:

inputs (list) – List of two input matrices, adjacency tensor and node features tensors in one-hot encoding format.
training (bool) – Should this layer be run in training mode. Typically decided by main model, influences things like dropout.

Returns:

encoder tensor – Tensor that been through number of convolutions followed by aggregation.

Return type:

tf.Tensor

get_config() → Dict[source]¶: Returns config dictionary for this layer.

class LSTMStep(*args, **kwargs)[source]¶

Layer that performs a single step LSTM update.

This layer performs a single step LSTM update. Note that it is not a full LSTM recurrent network. The LSTMStep layer is useful as a primitive for designing layers such as the AttnLSTMEmbedding or the IterRefLSTMEmbedding below.

__init__(output_dim, input_dim, init_fn='glorot_uniform', inner_init_fn='orthogonal', activation_fn='tanh', inner_activation_fn='hard_sigmoid', **kwargs)[source]¶

Parameters:

output_dim (int) – Dimensionality of output vectors.
input_dim (int) – Dimensionality of input vectors.
init_fn (str) – TensorFlow nitialization to use for W.
inner_init_fn (str) – TensorFlow initialization to use for U.
activation_fn (str) – TensorFlow activation to use for output.
inner_activation_fn (str) – TensorFlow activation to use for inner steps.

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶: Constructs learnable weights for this layer.

call(inputs)[source]¶

Execute this layer on input tensors.

Parameters:: inputs (list) – List of three tensors (x, h_tm1, c_tm1). h_tm1 means “h, t-1”.
Returns:: Returns h, [h, c]
Return type:: list

class AttnLSTMEmbedding(*args, **kwargs)[source]¶

Implements AttnLSTM as in matching networks paper.

The AttnLSTM embedding adjusts two sets of vectors, the “test” and “support” sets. The “support” consists of a set of evidence vectors. Think of these as the small training set for low-data machine learning. The “test” consists of the queries we wish to answer with the small amounts of available data. The AttnLSTMEmbdding allows us to modify the embedding of the “test” set depending on the contents of the “support”. The AttnLSTMEmbedding is thus a type of learnable metric that allows a network to modify its internal notion of distance.

See references [1]_ [2]_ for more details.

References

__init__(n_test, n_support, n_feat, max_depth, **kwargs)[source]¶

Parameters:

n_support (int) – Size of support set.
n_test (int) – Size of test set.
n_feat (int) – Number of features per atom
max_depth (int) – Number of “processing steps” used by sequence-to-sequence for sets model.

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]¶

Execute this layer on input tensors.

Parameters:: inputs (list) – List of two tensors (X, Xp). X should be of shape (n_test, n_feat) and Xp should be of shape (n_support, n_feat) where n_test is the size of the test set, n_support that of the support set, and n_feat is the number of per-atom features.
Returns:: Returns two tensors of same shape as input. Namely the output shape will be [(n_test, n_feat), (n_support, n_feat)]
Return type:: list

class IterRefLSTMEmbedding(*args, **kwargs)[source]¶

Implements the Iterative Refinement LSTM.

Much like AttnLSTMEmbedding, the IterRefLSTMEmbedding is another type of learnable metric which adjusts “test” and “support.” Recall that “support” is the small amount of data available in a low data machine learning problem, and that “test” is the query. The AttnLSTMEmbedding only modifies the “test” based on the contents of the support. However, the IterRefLSTM modifies both the “support” and “test” based on each other. This allows the learnable metric to be more malleable than that from AttnLSTMEmbeding.

__init__(n_test, n_support, n_feat, max_depth, **kwargs)[source]¶

Unlike the AttnLSTM model which only modifies the test vectors additively, this model allows for an additive update to be performed to both test and support using information from each other.

Parameters:

n_support (int) – Size of support set.
n_test (int) – Size of test set.
n_feat (int) – Number of input atom features
max_depth (int) – Number of LSTM Embedding layers.

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]¶

Execute this layer on input tensors.

Parameters:

inputs (list) – List of two tensors (X, Xp). X should be of shape (n_test, n_feat) and Xp should be of shape (n_support, n_feat) where n_test is the size of the test set, n_support that of the support set, and n_feat is the number of per-atom features.

Returns:

Returns two tensors of same shape as input. Namely the output
shape will be [(n_test, n_feat), (n_support, n_feat)]

class SwitchedDropout(*args, **kwargs)[source]¶

Apply dropout based on an input.

This is required for uncertainty prediction. The standard Keras Dropout layer only performs dropout during training, but we sometimes need to do it during prediction. The second input to this layer should be a scalar equal to 0 or 1, indicating whether to perform dropout.

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

call(inputs)[source]¶

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:

inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.
- NumPy array or Python scalar values in inputs get cast as tensors.
- Keras mask metadata is only collected from inputs.
- Layers are built (build(input_shape) method) using shape info from inputs only.
- input_spec compatibility is only checked against inputs.
- Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
- The SavedModel input specification is generated using inputs only.
- Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.
- mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class WeightedLinearCombo(*args, **kwargs)[source]¶

Computes a weighted linear combination of input layers, with the weights defined by trainable variables.

__init__(std=0.3, **kwargs)[source]¶

Initialize this layer.

Parameters:: std (float, optional (default 0.3)) – The standard deviation to use when randomly initializing weights.

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]¶

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:

inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.
- NumPy array or Python scalar values in inputs get cast as tensors.
- Keras mask metadata is only collected from inputs.
- Layers are built (build(input_shape) method) using shape info from inputs only.
- input_spec compatibility is only checked against inputs.
- Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
- The SavedModel input specification is generated using inputs only.
- Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.
- mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class CombineMeanStd(*args, **kwargs)[source]¶

Generate Gaussian nose.

__init__(training_only=False, noise_epsilon=1.0, **kwargs)[source]¶

Create a CombineMeanStd layer.

This layer should have two inputs with the same shape, and its output also has the same shape. Each element of the output is a Gaussian distributed random number whose mean is the corresponding element of the first input, and whose standard deviation is the corresponding element of the second input.

Parameters:

training_only (bool) – if True, noise is only generated during training. During prediction, the output is simply equal to the first input (that is, the mean of the distribution used during training).
noise_epsilon (float) – The noise is scaled by this factor

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

call(inputs, training=True)[source]¶

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:

inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.
- NumPy array or Python scalar values in inputs get cast as tensors.
- Keras mask metadata is only collected from inputs.
- Layers are built (build(input_shape) method) using shape info from inputs only.
- input_spec compatibility is only checked against inputs.
- Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
- The SavedModel input specification is generated using inputs only.
- Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.
- mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class Stack(*args, **kwargs)[source]¶

Stack the inputs along a new axis.

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

call(inputs)[source]¶

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:

inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.
- NumPy array or Python scalar values in inputs get cast as tensors.
- Keras mask metadata is only collected from inputs.
- Layers are built (build(input_shape) method) using shape info from inputs only.
- input_spec compatibility is only checked against inputs.
- Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
- The SavedModel input specification is generated using inputs only.
- Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.
- mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class VinaFreeEnergy(*args, **kwargs)[source]¶

Computes free-energy as defined by Autodock Vina.

TODO(rbharath): Make this layer support batching.

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

nonlinearity(c, w)[source]¶: Computes non-linearity used in Vina.

repulsion(d)[source]¶: Computes Autodock Vina’s repulsion interaction term.

hydrophobic(d)[source]¶: Computes Autodock Vina’s hydrophobic interaction term.

hydrogen_bond(d)[source]¶: Computes Autodock Vina’s hydrogen bond interaction term.

gaussian_first(d)[source]¶: Computes Autodock Vina’s first Gaussian interaction term.

gaussian_second(d)[source]¶: Computes Autodock Vina’s second Gaussian interaction term.

call(inputs)[source]¶

Parameters:

X (tf.Tensor of shape (N, d)) – Coordinates/features.
Z (tf.Tensor of shape (N)) – Atomic numbers of neighbor atoms.

Returns:

layer – The free energy of each complex in batch

Return type:

tf.Tensor of shape (B)

class NeighborList(*args, **kwargs)[source]¶

Computes a neighbor-list in Tensorflow.

Neighbor-lists (also called Verlet Lists) are a tool for grouping atoms which are close to each other spatially. This layer computes a Neighbor List from a provided tensor of atomic coordinates. You can think of this as a general “k-means” layer, but optimized for the case k==3.

TODO(rbharath): Make this layer support batching.

__init__(N_atoms, M_nbrs, ndim, nbr_cutoff, start, stop, **kwargs)[source]¶

Parameters:

N_atoms (int) – Maximum number of atoms this layer will neighbor-list.
M_nbrs (int) – Maximum number of spatial neighbors possible for atom.
ndim (int) – Dimensionality of space atoms live in. (Typically 3D, but sometimes will want to use higher dimensional descriptors for atoms).
nbr_cutoff (float) – Length in Angstroms (?) at which atom boxes are gridded.

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

call(inputs)[source]¶

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:

inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.
- NumPy array or Python scalar values in inputs get cast as tensors.
- Keras mask metadata is only collected from inputs.
- Layers are built (build(input_shape) method) using shape info from inputs only.
- input_spec compatibility is only checked against inputs.
- Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
- The SavedModel input specification is generated using inputs only.
- Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.
- mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

compute_nbr_list(coords)[source]¶

Get closest neighbors for atoms.

Needs to handle padding for atoms with no neighbors.

Parameters:: coords (tf.Tensor) – Shape (N_atoms, ndim)
Returns:: nbr_list – Shape (N_atoms, M_nbrs) of atom indices
Return type:: tf.Tensor

get_atoms_in_nbrs(coords, cells)[source]¶

Get the atoms in neighboring cells for each cells.

Return type:: atoms_in_nbrs = (N_atoms, n_nbr_cells, M_nbrs)

get_closest_atoms(coords, cells)[source]¶

For each cell, find M_nbrs closest atoms.

Let N_atoms be the number of atoms.

Parameters:

coords (tf.Tensor) – (N_atoms, ndim) shape.
cells (tf.Tensor) – (n_cells, ndim) shape.

Returns:

closest_inds – Of shape (n_cells, M_nbrs)

Return type:

tf.Tensor

get_cells_for_atoms(coords, cells)[source]¶

Compute the cells each atom belongs to.

Parameters:

coords (tf.Tensor) – Shape (N_atoms, ndim)
cells (tf.Tensor) – (n_cells, ndim) shape.

Returns:

cells_for_atoms – Shape (N_atoms, 1)

Return type:

tf.Tensor

get_neighbor_cells(cells)[source]¶

Compute neighbors of cells in grid.

# TODO(rbharath): Do we need to handle periodic boundary conditions properly here? # TODO(rbharath): This doesn’t handle boundaries well. We hard-code # looking for n_nbr_cells neighbors, which isn’t right for boundary cells in # the cube.

Parameters:: cells (tf.Tensor) – (n_cells, ndim) shape.
Returns:: nbr_cells – (n_cells, n_nbr_cells)
Return type:: tf.Tensor

get_cells()[source]¶

Returns the locations of all grid points in box.

Suppose start is -10 Angstrom, stop is 10 Angstrom, nbr_cutoff is 1. Then would return a list of length 20^3 whose entries would be [(-10, -10, -10), (-10, -10, -9), …, (9, 9, 9)]

Returns:: cells – (n_cells, ndim) shape.
Return type:: tf.Tensor

class AtomicConvolution(*args, **kwargs)[source]¶

Implements the atomic convolutional transform introduced in

Gomes, Joseph, et al. “Atomic convolutional networks for predicting protein-ligand binding affinity.” arXiv preprint arXiv:1703.10603 (2017).

At a high level, this transform performs a graph convolution on the nearest neighbors graph in 3D space.

__init__(atom_types=None, radial_params=[], boxsize=None, **kwargs)[source]¶

Atomic convolution layer

N = max_num_atoms, M = max_num_neighbors, B = batch_size, d = num_features l = num_radial_filters * num_atom_types

Parameters:

atom_types (list or None) – Of length a, where a is number of atom types for filtering.
radial_params (list) – Of length l, where l is number of radial filters learned.
boxsize (float or None) – Simulation box length [Angstrom].

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]¶

Parameters:

X (tf.Tensor of shape (B, N, d)) – Coordinates/features.
Nbrs (tf.Tensor of shape (B, N, M)) – Neighbor list.
Nbrs_Z (tf.Tensor of shape (B, N, M)) – Atomic numbers of neighbor atoms.

Returns:

layer – A new tensor representing the output of the atomic conv layer

Return type:

tf.Tensor of shape (B, N, l)

radial_symmetry_function(R, rc, rs, e)[source]¶

Calculates radial symmetry function.

B = batch_size, N = max_num_atoms, M = max_num_neighbors, d = num_filters

Parameters:

R (tf.Tensor of shape (B, N, M)) – Distance matrix.
rc (float) – Interaction cutoff [Angstrom].
rs (float) – Gaussian distance matrix mean.
e (float) – Gaussian distance matrix width.

Returns:

retval – Radial symmetry function (before summation)

Return type:

tf.Tensor of shape (B, N, M)

radial_cutoff(R, rc)[source]¶

Calculates radial cutoff matrix.

B = batch_size, N = max_num_atoms, M = max_num_neighbors

Parameters:

[B (R) – Distance matrix.
N (tf.Tensor) – Distance matrix.
M] (tf.Tensor) – Distance matrix.
rc (tf.Variable) – Interaction cutoff [Angstrom].

Returns:

FC [B, N, M] – Radial cutoff matrix.

Return type:

tf.Tensor

gaussian_distance_matrix(R, rs, e)[source]¶

Calculates gaussian distance matrix.

B = batch_size, N = max_num_atoms, M = max_num_neighbors

Parameters:

[B (R) – Distance matrix.
N (tf.Tensor) – Distance matrix.
M] (tf.Tensor) – Distance matrix.
rs (tf.Variable) – Gaussian distance matrix mean.
e (tf.Variable) – Gaussian distance matrix width (e = .5/std**2).

Returns:

retval [B, N, M] – Gaussian distance matrix.

Return type:

tf.Tensor

distance_tensor(X, Nbrs, boxsize, B, N, M, d)[source]¶

Calculates distance tensor for batch of molecules.

B = batch_size, N = max_num_atoms, M = max_num_neighbors, d = num_features

Parameters:

X (tf.Tensor of shape (B, N, d)) – Coordinates/features tensor.
Nbrs (tf.Tensor of shape (B, N, M)) – Neighbor list tensor.
boxsize (float or None) – Simulation box length [Angstrom].

Returns:

D – Coordinates/features distance tensor.

Return type:

tf.Tensor of shape (B, N, M, d)

distance_matrix(D)[source]¶

Calcuates the distance matrix from the distance tensor

B = batch_size, N = max_num_atoms, M = max_num_neighbors, d = num_features

Parameters:: D (tf.Tensor of shape (B, N, M, d)) – Distance tensor.
Returns:: R – Distance matrix.
Return type:: tf.Tensor of shape (B, N, M)

class AlphaShareLayer(*args, **kwargs)[source]¶

Part of a sluice network. Adds alpha parameters to control sharing between the main and auxillary tasks

Factory method AlphaShare should be used for construction

Parameters:

in_layers (list of Layers or tensors) – tensors in list must be the same size and list must include two or more tensors

Returns:

out_tensor (a tensor with shape [len(in_layers), x, y] where x, y were the original layer dimensions)
Distance matrix.

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]¶

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:

inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.
- NumPy array or Python scalar values in inputs get cast as tensors.
- Keras mask metadata is only collected from inputs.
- Layers are built (build(input_shape) method) using shape info from inputs only.
- input_spec compatibility is only checked against inputs.
- Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
- The SavedModel input specification is generated using inputs only.
- Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.
- mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class SluiceLoss(*args, **kwargs)[source]¶

Calculates the loss in a Sluice Network Every input into an AlphaShare should be used in SluiceLoss

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

call(inputs)[source]¶

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:

inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.
- NumPy array or Python scalar values in inputs get cast as tensors.
- Keras mask metadata is only collected from inputs.
- Layers are built (build(input_shape) method) using shape info from inputs only.
- input_spec compatibility is only checked against inputs.
- Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
- The SavedModel input specification is generated using inputs only.
- Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.
- mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class BetaShare(*args, **kwargs)[source]¶

Part of a sluice network. Adds beta params to control which layer outputs are used for prediction

Parameters:: in_layers (list of Layers or tensors) – tensors in list must be the same size and list must include two or more tensors
Returns:: output_layers – Distance matrix.
Return type:: list of Layers or tensors with same size as in_layers

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]¶: Size of input layers must all be the same

class ANIFeat(*args, **kwargs)[source]¶

Performs transform from 3D coordinates to ANI symmetry functions

__init__(max_atoms=23, radial_cutoff=4.6, angular_cutoff=3.1, radial_length=32, angular_length=8, atom_cases=[1, 6, 7, 8, 16], atomic_number_differentiated=True, coordinates_in_bohr=True, **kwargs)[source]¶: Only X can be transformed

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

call(inputs)[source]¶: In layers should be of shape dtype tf.float32, (None, self.max_atoms, 4)

distance_matrix(coordinates, flags)[source]¶: Generate distance matrix

distance_cutoff(d, cutoff, flags)[source]¶: Generate distance matrix with trainable cutoff

radial_symmetry(d_cutoff, d, atom_numbers)[source]¶: Radial Symmetry Function

angular_symmetry(d_cutoff, d, atom_numbers, coordinates)[source]¶: Angular Symmetry Function

class GraphEmbedPoolLayer(*args, **kwargs)[source]¶

GraphCNNPool Layer from Robust Spatial Filtering with Graph Convolutional Neural Networks https://arxiv.org/abs/1703.00792

This is a learnable pool operation It constructs a new adjacency matrix for a graph of specified number of nodes.

This differs from our other pool operations which set vertices to a function value without altering the adjacency matrix.

..math:: V_{emb} = SpatialGraphCNN({V_{in}}) ..math:: V_{out} = sigma(V_{emb})^{T} * V_{in} ..math:: A_{out} = V_{emb}^{T} * A_{in} * V_{emb}

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]¶

Parameters:

num_filters (int) – Number of filters to have in the output
in_layers (list of Layers or tensors) – [V, A, mask] V are the vertex features must be of shape (batch, vertex, channel)
graph (A are the adjacency matrixes for each) – Shape (batch, from_vertex, adj_matrix, to_vertex)
optional (mask is) –
the (to be used when not every graph has) –
vertices (same number of) –

Returns:

Returns a tf.tensor with a graph convolution applied
The shape will be (batch, vertex, self.num_filters).

class GraphCNN(*args, **kwargs)[source]¶

GraphCNN Layer from Robust Spatial Filtering with Graph Convolutional Neural Networks https://arxiv.org/abs/1703.00792

Spatial-domain convolutions can be defined as H = h_0I + h_1A + h_2A^2 + … + hkAk, H ∈ R**(N×N)

We approximate it by H ≈ h_0I + h_1A

We can define a convolution as applying multiple these linear filters over edges of different types (think up, down, left, right, diagonal in images) Where each edge type has its own adjacency matrix H ≈ h_0I + h_1A_1 + h_2A_2 + … h_(L−1)A_(L−1)

V_out = sum_{c=1}^{C} H^{c} V^{c} + b

__init__(num_filters, **kwargs)[source]¶

Parameters:

num_filters (int) – Number of filters to have in the output
in_layers (list of Layers or tensors) – [V, A, mask] V are the vertex features must be of shape (batch, vertex, channel)
graph (A are the adjacency matrixes for each) – Shape (batch, from_vertex, adj_matrix, to_vertex)
optional (mask is) –
the (to be used when not every graph has) –
vertices (same number of) –
Returns (tf.tensor) –
applied (Returns a tf.tensor with a graph convolution) –
(batch (The shape will be) –
vertex –
self.num_filters) –

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]¶

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:

inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.
- NumPy array or Python scalar values in inputs get cast as tensors.
- Keras mask metadata is only collected from inputs.
- Layers are built (build(input_shape) method) using shape info from inputs only.
- input_spec compatibility is only checked against inputs.
- Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
- The SavedModel input specification is generated using inputs only.
- Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.
- mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class Highway(*args, **kwargs)[source]¶

Create a highway layer. y = H(x) * T(x) + x * (1 - T(x))

H(x) = activation_fn(matmul(W_H, x) + b_H) is the non-linear transformed output T(x) = sigmoid(matmul(W_T, x) + b_T) is the transform gate

Implementation based on paper

Srivastava, Rupesh Kumar, Klaus Greff, and Jürgen Schmidhuber. “Highway networks.” arXiv preprint arXiv:1505.00387 (2015).

This layer expects its input to be a two dimensional tensor of shape (batch size, # input features). Outputs will be in the same shape.

__init__(activation_fn='relu', biases_initializer='zeros', weights_initializer=None, **kwargs)[source]¶

Parameters:

activation_fn (object) – the Tensorflow activation function to apply to the output
biases_initializer (callable object) – the initializer for bias values. This may be None, in which case the layer will not include biases.
weights_initializer (callable object) – the initializer for weight values

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]¶

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:

inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.
- NumPy array or Python scalar values in inputs get cast as tensors.
- Keras mask metadata is only collected from inputs.
- Layers are built (build(input_shape) method) using shape info from inputs only.
- input_spec compatibility is only checked against inputs.
- Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
- The SavedModel input specification is generated using inputs only.
- Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.
- mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class WeaveLayer(*args, **kwargs)[source]¶

This class implements the core Weave convolution from the Google graph convolution paper [1]_

This model contains atom features and bond features separately.Here, bond features are also called pair features. There are 2 types of transformation, atom->atom, atom->pair, pair->atom, pair->pair that this model implements.

Examples

This layer expects 4 inputs in a list of the form [atom_features, pair_features, pair_split, atom_to_pair]. We’ll walk through the structure of these inputs. Let’s start with some basic definitions.

>>> import deepchem as dc
>>> import numpy as np

Suppose you have a batch of molecules

>>> smiles = ["CCC", "C"]

Note that there are 4 atoms in total in this system. This layer expects its input molecules to be batched together.

>>> total_n_atoms = 4

Let’s suppose that we have a featurizer that computes n_atom_feat features per atom.

>>> n_atom_feat = 75

Then conceptually, atom_feat is the array of shape (total_n_atoms, n_atom_feat) of atomic features. For simplicity, let’s just go with a random such matrix.

>>> atom_feat = np.random.rand(total_n_atoms, n_atom_feat)

Let’s suppose we have n_pair_feat pairwise features

>>> n_pair_feat = 14

For each molecule, we compute a matrix of shape (n_atoms*n_atoms, n_pair_feat) of pairwise features for each pair of atoms in the molecule. Let’s construct this conceptually for our example.

>>> pair_feat = [np.random.rand(3*3, n_pair_feat), np.random.rand(1*1, n_pair_feat)]
>>> pair_feat = np.concatenate(pair_feat, axis=0)
>>> pair_feat.shape
(10, 14)

pair_split is an index into pair_feat which tells us which atom each row belongs to. In our case, we hve

>>> pair_split = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3])

That is, the first 9 entries belong to “CCC” and the last entry to “C”. The final entry atom_to_pair goes in a little more in-depth than pair_split and tells us the precise pair each pair feature belongs to. In our case

>>> atom_to_pair = np.array([[0, 0],
...                          [0, 1],
...                          [0, 2],
...                          [1, 0],
...                          [1, 1],
...                          [1, 2],
...                          [2, 0],
...                          [2, 1],
...                          [2, 2],
...                          [3, 3]])

Let’s now define the actual layer

>>> layer = WeaveLayer()

And invoke it

>>> [A, P] = layer([atom_feat, pair_feat, pair_split, atom_to_pair])

The weave layer produces new atom/pair features. Let’s check their shapes

>>> A = np.array(A)
>>> A.shape
(4, 50)
>>> P = np.array(P)
>>> P.shape
(10, 50)

The 4 is total_num_atoms and the 10 is the total number of pairs. Where does 50 come from? It’s from the default arguments n_atom_input_feat and n_pair_input_feat.

References

__init__(n_atom_input_feat: int = 75, n_pair_input_feat: int = 14, n_atom_output_feat: int = 50, n_pair_output_feat: int = 50, n_hidden_AA: int = 50, n_hidden_PA: int = 50, n_hidden_AP: int = 50, n_hidden_PP: int = 50, update_pair: bool = True, init: str = 'glorot_uniform', activation: str = 'relu', batch_normalize: bool = True, batch_normalize_kwargs: Dict = {'renorm': True}, **kwargs)[source]¶

Parameters:

n_atom_input_feat (int, optional (default 75)) – Number of features for each atom in input.
n_pair_input_feat (int, optional (default 14)) – Number of features for each pair of atoms in input.
n_atom_output_feat (int, optional (default 50)) – Number of features for each atom in output.
n_pair_output_feat (int, optional (default 50)) – Number of features for each pair of atoms in output.
n_hidden_AA (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
n_hidden_PA (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
n_hidden_AP (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
n_hidden_PP (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
update_pair (bool, optional (default True)) – Whether to calculate for pair features, could be turned off for last layer
init (str, optional (default 'glorot_uniform')) – Weight initialization for filters.
activation (str, optional (default 'relu')) – Activation function applied
batch_normalize (bool, optional (default True)) – If this is turned on, apply batch normalization before applying activation functions on convolutional layers.
batch_normalize_kwargs (Dict, optional (default {renorm=True})) – Batch normalization is a complex layer which has many potential argumentswhich change behavior. This layer accepts user-defined parameters which are passed to all BatchNormalization layers in WeaveModel, WeaveLayer, and WeaveGather.

get_config() → Dict[source]¶: Returns config dictionary for this layer.

build(input_shape)[source]¶

Construct internal trainable weights.

Parameters:: input_shape (tuple) – Ignored since we don’t need the input shape to create internal weights.

call(inputs: List) → List[source]¶

Creates weave tensors.

Parameters:: inputs (List) – Should contain 4 tensors [atom_features, pair_features, pair_split, atom_to_pair]

class WeaveGather(*args, **kwargs)[source]¶

Implements the weave-gathering section of weave convolutions.

Implements the gathering layer from [1]_. The weave gathering layer gathers per-atom features to create a molecule-level fingerprint in a weave convolutional network. This layer can also performs Gaussian histogram expansion as detailed in [1]_. Note that the gathering function here is simply addition as in [1]_>

Examples

This layer expects 2 inputs in a list of the form [atom_features, pair_features]. We’ll walk through the structure of these inputs. Let’s start with some basic definitions.

>>> import deepchem as dc
>>> import numpy as np

Suppose you have a batch of molecules

>>> smiles = ["CCC", "C"]

Note that there are 4 atoms in total in this system. This layer expects its input molecules to be batched together.

>>> total_n_atoms = 4

Let’s suppose that we have n_atom_feat features per atom.

>>> n_atom_feat = 75

Then conceptually, atom_feat is the array of shape (total_n_atoms, n_atom_feat) of atomic features. For simplicity, let’s just go with a random such matrix.

>>> atom_feat = np.random.rand(total_n_atoms, n_atom_feat)

We then need to provide a mapping of indices to the atoms they belong to. In ours case this would be

>>> atom_split = np.array([0, 0, 0, 1])

Let’s now define the actual layer

>>> gather = WeaveGather(batch_size=2, n_input=n_atom_feat)
>>> output_molecules = gather([atom_feat, atom_split])
>>> len(output_molecules)
2

References

Note

This class requires tensorflow_probability to be installed.

__init__(batch_size: int, n_input: int = 128, gaussian_expand: bool = True, compress_post_gaussian_expansion: bool = False, init: str = 'glorot_uniform', activation: str = 'tanh', **kwargs)[source]¶

Parameters:

batch_size (int) – number of molecules in a batch
n_input (int, optional (default 128)) – number of features for each input molecule
gaussian_expand (boolean, optional (default True)) – Whether to expand each dimension of atomic features by gaussian histogram
compress_post_gaussian_expansion (bool, optional (default False)) – If True, compress the results of the Gaussian expansion back to the original dimensions of the input by using a linear layer with specified activation function. Note that this compression was not in the original paper, but was present in the original DeepChem implementation so is left present for backwards compatibility.
init (str, optional (default 'glorot_uniform')) – Weight initialization for filters if compress_post_gaussian_expansion is True.
activation (str, optional (default 'tanh')) – Activation function applied for filters if compress_post_gaussian_expansion is True. Should be recognizable by tf.keras.activations.

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs: List) → List[source]¶

Creates weave tensors.

Parameters:: inputs (List) – Should contain 2 tensors [atom_features, atom_split]
Returns:: output_molecules – Each entry in this list is of shape (self.n_inputs,)
Return type:: List

gaussian_histogram(x)[source]¶

Expands input into a set of gaussian histogram bins.

Parameters:: x (tf.Tensor) – Of shape (N, n_feat)

Examples

This method uses 11 bins spanning portions of a Gaussian with zero mean and unit standard deviation.

>>> gaussian_memberships = [(-1.645, 0.283), (-1.080, 0.170),
...                         (-0.739, 0.134), (-0.468, 0.118),
...                         (-0.228, 0.114), (0., 0.114),
...                         (0.228, 0.114), (0.468, 0.118),
...                         (0.739, 0.134), (1.080, 0.170),
...                         (1.645, 0.283)]

We construct a Gaussian at gaussian_memberships[i][0] with standard deviation gaussian_memberships[i][1]. Each feature in x is assigned the probability of falling in each Gaussian, and probabilities are normalized across the 11 different Gaussians.

Returns:: outputs – Of shape (N, 11*n_feat)
Return type:: tf.Tensor

class DTNNEmbedding(*args, **kwargs)[source]¶

__init__(n_embedding=30, periodic_table_length=30, init='glorot_uniform', **kwargs)[source]¶

Parameters:

n_embedding (int, optional) – Number of features for each atom
periodic_table_length (int, optional) – Length of embedding, 83=Bi
init (str, optional) – Weight initialization for filters.

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]¶: parent layers: atom_number

class DTNNStep(*args, **kwargs)[source]¶

__init__(n_embedding=30, n_distance=100, n_hidden=60, init='glorot_uniform', activation='tanh', **kwargs)[source]¶

Parameters:

n_embedding (int, optional) – Number of features for each atom
n_distance (int, optional) – granularity of distance matrix
n_hidden (int, optional) – Number of nodes in hidden layer
init (str, optional) – Weight initialization for filters.
activation (str, optional) – Activation function applied

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]¶: parent layers: atom_features, distance, distance_membership_i, distance_membership_j

class DTNNGather(*args, **kwargs)[source]¶

__init__(n_embedding=30, n_outputs=100, layer_sizes=[100], output_activation=True, init='glorot_uniform', activation='tanh', **kwargs)[source]¶

Parameters:

n_embedding (int, optional) – Number of features for each atom
n_outputs (int, optional) – Number of features for each molecule(output)
layer_sizes (list of int, optional(default=[1000])) – Structure of hidden layer(s)
init (str, optional) – Weight initialization for filters.
activation (str, optional) – Activation function applied

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]¶: parent layers: atom_features, atom_membership

class DAGLayer(*args, **kwargs)[source]¶

DAG computation layer.

This layer generates a directed acyclic graph for each atom in a molecule. This layer is based on the algorithm from the following paper:

Lusci, Alessandro, Gianluca Pollastri, and Pierre Baldi. “Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules.” Journal of chemical information and modeling 53.7 (2013): 1563-1575.

This layer performs a sort of inward sweep. Recall that for each atom, a DAG is generated that “points inward” to that atom from the undirected molecule graph. Picture this as “picking up” the atom as the vertex and using the natural tree structure that forms from gravity. The layer “sweeps inwards” from the leaf nodes of the DAG upwards to the atom. This is batched so the transformation is done for each atom.

__init__(n_graph_feat=30, n_atom_feat=75, max_atoms=50, layer_sizes=[100], init='glorot_uniform', activation='relu', dropout=None, batch_size=64, **kwargs)[source]¶

Parameters:

n_graph_feat (int, optional) – Number of features for each node(and the whole grah).
n_atom_feat (int, optional) – Number of features listed per atom.
max_atoms (int, optional) – Maximum number of atoms in molecules.
layer_sizes (list of int, optional(default=[100])) – List of hidden layer size(s): length of this list represents the number of hidden layers, and each element is the width of corresponding hidden layer.
init (str, optional) – Weight initialization for filters.
activation (str, optional) – Activation function applied.
dropout (float, optional) – Dropout probability in hidden layer(s).
batch_size (int, optional) – number of molecules in a batch.

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶: “Construct internal trainable weights.

call(inputs, training=True)[source]¶: parent layers: atom_features, parents, calculation_orders, calculation_masks, n_atoms

class DAGGather(*args, **kwargs)[source]¶

__init__(n_graph_feat=30, n_outputs=30, max_atoms=50, layer_sizes=[100], init='glorot_uniform', activation='relu', dropout=None, **kwargs)[source]¶

DAG vector gathering layer

Parameters:

n_graph_feat (int, optional) – Number of features for each atom.
n_outputs (int, optional) – Number of features for each molecule.
max_atoms (int, optional) – Maximum number of atoms in molecules.
layer_sizes (list of int, optional) – List of hidden layer size(s): length of this list represents the number of hidden layers, and each element is the width of corresponding hidden layer.
init (str, optional) – Weight initialization for filters.
activation (str, optional) – Activation function applied.
dropout (float, optional) – Dropout probability in the hidden layer(s).

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs, training=True)[source]¶: parent layers: atom_features, membership

class MessagePassing(*args, **kwargs)[source]¶

General class for MPNN default structures built according to https://arxiv.org/abs/1511.06391

__init__(T, message_fn='enn', update_fn='gru', n_hidden=100, **kwargs)[source]¶

Parameters:

T (int) – Number of message passing steps
message_fn (str, optional) – message function in the model
update_fn (str, optional) – update function in the model
n_hidden (int, optional) – number of hidden units in the passing phase

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]¶: Perform T steps of message passing

class EdgeNetwork(*args, **kwargs)[source]¶

Submodule for Message Passing

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]¶

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:

inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.
- NumPy array or Python scalar values in inputs get cast as tensors.
- Keras mask metadata is only collected from inputs.
- Layers are built (build(input_shape) method) using shape info from inputs only.
- input_spec compatibility is only checked against inputs.
- Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
- The SavedModel input specification is generated using inputs only.
- Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.
- mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class GatedRecurrentUnit(*args, **kwargs)[source]¶

Submodule for Message Passing

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]¶

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:

inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.
- NumPy array or Python scalar values in inputs get cast as tensors.
- Keras mask metadata is only collected from inputs.
- Layers are built (build(input_shape) method) using shape info from inputs only.
- input_spec compatibility is only checked against inputs.
- Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
- The SavedModel input specification is generated using inputs only.
- Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.
- mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class SetGather(*args, **kwargs)[source]¶

set2set gather layer for graph-based model

Models using this layer must set pad_batches=True.

__init__(M, batch_size, n_hidden=100, init='orthogonal', **kwargs)[source]¶

Parameters:

M (int) – Number of LSTM steps
batch_size (int) – Number of samples in a batch(all batches must have same size)
n_hidden (int, optional) – number of hidden units in the passing phase

get_config()[source]¶

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).

Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.

Returns:: Python dictionary.

build(input_shape)[source]¶

Creates the variables of the layer (for subclass implementers).

This is a method that implementers of subclasses of Layer or Model can override if they need a state-creation step in-between layer instantiation and layer call. It is invoked automatically before the first execution of call().

This is typically used to create the weights of Layer subclasses (at the discretion of the subclass implementer).

Parameters:: input_shape – Instance of TensorShape, or list of instances of TensorShape if the layer expects a list of inputs (one instance per input).

call(inputs)[source]¶

Perform M steps of set2set gather,

Detailed descriptions in: https://arxiv.org/abs/1511.06391

Torch Layers¶

class AtomicConv(n_tasks: int, frag1_num_atoms: int = 70, frag2_num_atoms: int = 634, complex_num_atoms: int = 701, max_num_neighbors: int = 12, batch_size: int = 24, atom_types: Sequence[float] = [6, 7.0, 8.0, 9.0, 11.0, 12.0, 15.0, 16.0, 17.0, 20.0, 25.0, 30.0, 35.0, 53.0, -1.0], radial: Sequence[Sequence[float]] = [[1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 10.5, 11.0, 11.5, 12.0], [0.0, 4.0, 8.0], [0.4]], layer_sizes=[100], weight_init_stddevs: float | Sequence[float] = 0.02, bias_init_consts: float | Sequence[float] = 1.0, dropouts: float | Sequence[float] = 0.5, activation_fns: Callable | str | Sequence[Callable | str] = ['relu'], init: str = 'trunc_normal_', **kwargs)[source]¶

Implements an Atomic Convolution Model.

The atomic convolutional networks function as a variant of graph convolutions. The difference is that the “graph” here is the nearest neighbors graph in 3D space [1]. The AtomicConvModule leverages these connections in 3D space to train models that learn to predict energetic states starting from the spatial geometry of the model.

References

Examples

>>> n_tasks = 1
>>> frag1_num_atoms = 70
>>> frag2_num_atoms = 634
>>> complex_num_atoms = 701
>>> max_num_neighbors = 12
>>> batch_size = 24
>>> atom_types = [
...     6, 7., 8., 9., 11., 12., 15., 16., 17., 20., 25., 30., 35., 53.,
...     -1.
... ]
>>> radial = [[
...     1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5,
...     8.0, 8.5, 9.0, 9.5, 10.0, 10.5, 11.0, 11.5, 12.0
... ], [0.0, 4.0, 8.0], [0.4]]
>>> layer_sizes = [32, 32, 16]
>>> acnn_model = AtomicConv(n_tasks=n_tasks,
... frag1_num_atoms=frag1_num_atoms,
... frag2_num_atoms=frag2_num_atoms,
... complex_num_atoms=complex_num_atoms,
... max_num_neighbors=max_num_neighbors,
... batch_size=batch_size,
... atom_types=atom_types,
... radial=radial,
... layer_sizes=layer_sizes)

__init__(n_tasks: int, frag1_num_atoms: int = 70, frag2_num_atoms: int = 634, complex_num_atoms: int = 701, max_num_neighbors: int = 12, batch_size: int = 24, atom_types: Sequence[float] = [6, 7.0, 8.0, 9.0, 11.0, 12.0, 15.0, 16.0, 17.0, 20.0, 25.0, 30.0, 35.0, 53.0, -1.0], radial: Sequence[Sequence[float]] = [[1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 10.5, 11.0, 11.5, 12.0], [0.0, 4.0, 8.0], [0.4]], layer_sizes=[100], weight_init_stddevs: float | Sequence[float] = 0.02, bias_init_consts: float | Sequence[float] = 1.0, dropouts: float | Sequence[float] = 0.5, activation_fns: Callable | str | Sequence[Callable | str] = ['relu'], init: str = 'trunc_normal_', **kwargs) → None[source]¶

Parameters:

n_tasks (int) – number of tasks
frag1_num_atoms (int) – Number of atoms in first fragment
frag2_num_atoms (int) – Number of atoms in sec
max_num_neighbors (int) – Maximum number of neighbors possible for an atom. Recall neighbors are spatial neighbors.
atom_types (list) – List of atoms recognized by model. Atoms are indicated by their nuclear numbers.
radial (list) – Radial parameters used in the atomic convolution transformation.
layer_sizes (list) – the size of each dense layer in the network. The length of this list determines the number of layers.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_sizes). Alternatively, this may be a single value instead of a list, where the same value is used for every layer.
bias_init_consts (list or float) – the value to initialize the biases in each layer. The length of this list should equal len(layer_sizes). Alternatively, this may be a single value instead of a list, where the same value is used for every layer.
dropouts (list or float) – the dropout probability to use for each layer. The length of this list should equal len(layer_sizes). Alternatively, this may be a single value instead of a list, where the same value is used for every layer.
activation_fns (list or object) – the Tensorflow activation function to apply to each layer. The length of this list should equal len(layer_sizes). Alternatively, this may be a single value instead of a list, where the same value is used for every layer.

forward(inputs: Tensor | Sequence[Tensor])[source]¶

Parameters:: inputs (torch.Tensor) – Input Tensor
Returns:: Output for each label.
Return type:: torch.Tensor

class MultilayerPerceptron(d_input: int, d_output: int, d_hidden: tuple | None = None, dropout: float = 0.0, batch_norm: bool = False, batch_norm_momentum: float = 0.1, activation_fn: Callable | str = 'relu', skip_connection: bool = False, weighted_skip: bool = True)[source]¶

A simple fully connected feed-forward network, otherwise known as a multilayer perceptron (MLP).

Examples

>>> model = MultilayerPerceptron(d_input=10, d_hidden=(2,3), d_output=2, dropout=0.0, activation_fn='relu')
>>> x = torch.ones(2, 10)
>>> out = model(x)
>>> print(out.shape)
torch.Size([2, 2])

__init__(d_input: int, d_output: int, d_hidden: tuple | None = None, dropout: float = 0.0, batch_norm: bool = False, batch_norm_momentum: float = 0.1, activation_fn: Callable | str = 'relu', skip_connection: bool = False, weighted_skip: bool = True)[source]¶

Initialize the model.

Parameters:

d_input (int) – the dimension of the input layer
d_output (int) – the dimension of the output layer
d_hidden (tuple) – the dimensions of the hidden layers
dropout (float) – the dropout probability
batch_norm (bool) – whether to use batch normalization
batch_norm_momentum (float) – the momentum for batch normalization
activation_fn (str) – the activation function to use in the hidden layers
skip_connection (bool) – whether to add a skip connection from the input to the output
weighted_skip (bool) – whether to add a weighted skip connection from the input to the output

build_layers()[source]¶: Build the layers of the model, iterating through the hidden dimensions to produce a list of layers.

forward(x: Tensor) → Tensor[source]¶: Forward pass of the model.

class CNNModule(n_tasks: int, n_features: int, dims: int, layer_filters: List[int] = [100], kernel_size: int | Sequence[int] = 5, strides: int | Sequence[int] = 1, weight_init_stddevs: float | Sequence[float] = 0.02, bias_init_consts: float | Sequence[float] = 1.0, dropouts: float | Sequence[float] = 0.5, activation_fns: Callable | str | Sequence[Callable | str] = 'relu', pool_type: str = 'max', mode: str = 'classification', n_classes: int = 2, uncertainty: bool = False, residual: bool = False, padding: int | str = 'valid')[source]¶

A 1, 2, or 3 dimensional convolutional network for either regression or classification. The network consists of the following sequence of layers: - A configurable number of convolutional layers - A global pooling layer (either max pool or average pool) - A final fully connected layer to compute the output It optionally can compose the model from pre-activation residual blocks, as described in https://arxiv.org/abs/1603.05027, rather than a simple stack of convolution layers. This often leads to easier training, especially when using a large number of layers. Note that residual blocks can only be used when successive layers have the same output shape. Wherever the output shape changes, a simple convolution layer will be used even if residual=True. .. rubric:: Examples

>>> model = CNNModule(n_tasks=5, n_features=8, dims=2, layer_filters=[3,8,8,16], kernel_size=3, n_classes = 7, mode='classification', uncertainty=False, padding='same')
>>> x = torch.ones(2, 224, 224, 8)
>>> x = model(x)
>>> for tensor in x:
...    print(tensor.shape)
torch.Size([2, 5, 7])
torch.Size([2, 5, 7])

__init__(n_tasks: int, n_features: int, dims: int, layer_filters: List[int] = [100], kernel_size: int | Sequence[int] = 5, strides: int | Sequence[int] = 1, weight_init_stddevs: float | Sequence[float] = 0.02, bias_init_consts: float | Sequence[float] = 1.0, dropouts: float | Sequence[float] = 0.5, activation_fns: Callable | str | Sequence[Callable | str] = 'relu', pool_type: str = 'max', mode: str = 'classification', n_classes: int = 2, uncertainty: bool = False, residual: bool = False, padding: int | str = 'valid') → None[source]¶

Create a CNN.

Parameters:

n_tasks (int) – number of tasks
n_features (int) – number of features
dims (int) – the number of dimensions to apply convolutions over (1, 2, or 3)
layer_filters (list) – the number of output filters for each convolutional layer in the network. The length of this list determines the number of layers.
kernel_size (int, tuple, or list) – a list giving the shape of the convolutional kernel for each layer. Each element may be either an int (use the same kernel width for every dimension) or a tuple (the kernel width along each dimension). Alternatively this may be a single int or tuple instead of a list, in which case the same kernel shape is used for every layer.
strides (int, tuple, or list) – a list giving the stride between applications of the kernel for each layer. Each element may be either an int (use the same stride for every dimension) or a tuple (the stride along each dimension). Alternatively this may be a single int or tuple instead of a list, in which case the same stride is used for every layer.
weight_init_stddevs (list or float) – the standard deviation of the distribution to use for weight initialization of each layer. The length of this list should equal len(layer_filters)+1, where the final element corresponds to the dense layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
bias_init_consts (list or float) – the value to initialize the biases in each layer to. The length of this list should equal len(layer_filters)+1, where the final element corresponds to the dense layer. Alternatively this may be a single value instead of a list, in which case the same value is used for every layer.
dropouts (list or float) – the dropout probability to use for each layer. The length of this list should equal len(layer_filters). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer
activation_fns (str or list) – the torch activation function to apply to each layer. The length of this list should equal len(layer_filters). Alternatively this may be a single value instead of a list, in which case the same value is used for every layer, ‘relu’ by default
pool_type (str) – the type of pooling layer to use, either ‘max’ or ‘average’
mode (str) – Either ‘classification’ or ‘regression’
n_classes (int) – the number of classes to predict (only used in classification mode)
uncertainty (bool) – if True, include extra outputs and loss terms to enable the uncertainty in outputs to be predicted
residual (bool) – if True, the model will be composed of pre-activation residual blocks instead of a simple stack of convolutional layers.
padding (str, int or tuple) – the padding to use for convolutional layers, either ‘valid’ or ‘same’

forward(inputs: Tensor | Sequence[Tensor]) → List[Any][source]¶

Parameters:: x (torch.Tensor) – Input Tensor
Returns:: Output as per use case : regression/classification
Return type:: torch.Tensor

class ScaleNorm(scale: float, eps: float = 1e-05)[source]¶

Apply Scale Normalization to input.

The ScaleNorm layer first computes the square root of the scale, then computes the matrix/vector norm of the input tensor. The norm value is calculated as sqrt(scale) / matrix norm. Finally, the result is returned as input_tensor * norm value.

This layer can be used instead of LayerNorm when a scaled version of the norm is required. Instead of performing the scaling operation (scale / norm) in a lambda-like layer, we are defining it within this layer to make prototyping more efficient.

References

Examples

>>> from deepchem.models.torch_models.layers import ScaleNorm
>>> scale = 0.35
>>> layer = ScaleNorm(scale)
>>> input_tensor = torch.tensor([[1.269, 39.36], [0.00918, -9.12]])
>>> output_tensor = layer(input_tensor)

__init__(scale: float, eps: float = 1e-05)[source]¶

Initialize a ScaleNorm layer.

Parameters:

scale (float) – Scale magnitude.
eps (float) – Epsilon value. Default = 1e-5.

forward(x: Tensor) → Tensor[source]¶

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class MATEncoderLayer(dist_kernel: str = 'softmax', lambda_attention: float = 0.33, lambda_distance: float = 0.33, h: int = 16, sa_hsize: int = 1024, sa_dropout_p: float = 0.0, output_bias: bool = True, d_input: int = 1024, d_hidden: int = 1024, d_output: int = 1024, activation: str = 'leakyrelu', n_layers: int = 1, ff_dropout_p: float = 0.0, encoder_hsize: int = 1024, encoder_dropout_p: float = 0.0)[source]¶

Encoder layer for use in the Molecular Attention Transformer [1]_.

The MATEncoder layer primarily consists of a self-attention layer (MultiHeadedMATAttention) and a feed-forward layer (PositionwiseFeedForward). This layer can be stacked multiple times to form an encoder.

References

Examples

>>> from rdkit import Chem
>>> import torch
>>> import deepchem
>>> from deepchem.models.torch_models.layers import MATEmbedding, MATEncoderLayer
>>> input_smile = "CC"
>>> feat = deepchem.feat.MATFeaturizer()
>>> out = feat.featurize(input_smile)
>>> node = torch.tensor(out[0].node_features).float().unsqueeze(0)
>>> adj = torch.tensor(out[0].adjacency_matrix).float().unsqueeze(0)
>>> dist = torch.tensor(out[0].distance_matrix).float().unsqueeze(0)
>>> mask = torch.sum(torch.abs(node), dim=-1) != 0
>>> layer = MATEncoderLayer()
>>> op = MATEmbedding()(node)
>>> output = layer(op, mask, adj, dist)

__init__(dist_kernel: str = 'softmax', lambda_attention: float = 0.33, lambda_distance: float = 0.33, h: int = 16, sa_hsize: int = 1024, sa_dropout_p: float = 0.0, output_bias: bool = True, d_input: int = 1024, d_hidden: int = 1024, d_output: int = 1024, activation: str = 'leakyrelu', n_layers: int = 1, ff_dropout_p: float = 0.0, encoder_hsize: int = 1024, encoder_dropout_p: float = 0.0)[source]¶

Initialize a MATEncoder layer.

Parameters:

dist_kernel (str) – Kernel activation to be used. Can be either ‘softmax’ for softmax or ‘exp’ for exponential, for the self-attention layer.
lambda_attention (float) – Constant to be multiplied with the attention matrix in the self-attention layer.
lambda_distance (float) – Constant to be multiplied with the distance matrix in the self-attention layer.
h (int) – Number of attention heads for the self-attention layer.
sa_hsize (int) – Size of dense layer in the self-attention layer.
sa_dropout_p (float) – Dropout probability for the self-attention layer.
output_bias (bool) – If True, dense layers will use bias vectors in the self-attention layer.
d_input (int) – Size of input layer in the feed-forward layer.
d_hidden (int) – Size of hidden layer in the feed-forward layer.
d_output (int) – Size of output layer in the feed-forward layer.
activation (str) – Activation function to be used in the feed-forward layer. Can choose between ‘relu’ for ReLU, ‘leakyrelu’ for LeakyReLU, ‘prelu’ for PReLU, ‘tanh’ for TanH, ‘selu’ for SELU, ‘elu’ for ELU and ‘linear’ for linear activation.
n_layers (int) – Number of layers in the feed-forward layer.
dropout_p (float) – Dropout probability in the feeed-forward layer.
encoder_hsize (int) – Size of Dense layer for the encoder itself.
encoder_dropout_p (float) – Dropout probability for connections in the encoder layer.

forward(x: Tensor, mask: Tensor, adj_matrix: Tensor, distance_matrix: Tensor, sa_dropout_p: float = 0.0) → Tensor[source]¶

Output computation for the MATEncoder layer.

In the MATEncoderLayer intialization, self.sublayer is defined as an nn.ModuleList of 2 layers. We will be passing our computation through these layers sequentially. nn.ModuleList is subscriptable and thus we can access it as self.sublayer[0], for example.

Parameters:

x (torch.Tensor) – Input tensor.
mask (torch.Tensor) – Masks out padding values so that they are not taken into account when computing the attention score.
adj_matrix (torch.Tensor) – Adjacency matrix of a molecule.
distance_matrix (torch.Tensor) – Distance matrix of a molecule.
sa_dropout_p (float) – Dropout probability for the self-attention layer (MultiHeadedMATAttention).

class MultiHeadedMATAttention(dist_kernel: str = 'softmax', lambda_attention: float = 0.33, lambda_distance: float = 0.33, h: int = 16, hsize: int = 1024, dropout_p: float = 0.0, output_bias: bool = True)[source]¶

First constructs an attention layer tailored to the Molecular Attention Transformer [1]_ and then converts it into Multi-Headed Attention.

In Multi-Headed attention the attention mechanism multiple times parallely through the multiple attention heads. Thus, different subsequences of a given sequences can be processed differently. The query, key and value parameters are split multiple ways and each split is passed separately through a different attention head. .. rubric:: References

Examples

>>> from deepchem.models.torch_models.layers import MultiHeadedMATAttention, MATEmbedding
>>> import deepchem as dc
>>> import torch
>>> input_smile = "CC"
>>> feat = dc.feat.MATFeaturizer()
>>> input_smile = "CC"
>>> out = feat.featurize(input_smile)
>>> node = torch.tensor(out[0].node_features).float().unsqueeze(0)
>>> adj = torch.tensor(out[0].adjacency_matrix).float().unsqueeze(0)
>>> dist = torch.tensor(out[0].distance_matrix).float().unsqueeze(0)
>>> mask = torch.sum(torch.abs(node), dim=-1) != 0
>>> layer = MultiHeadedMATAttention(
...    dist_kernel='softmax',
...    lambda_attention=0.33,
...    lambda_distance=0.33,
...    h=16,
...    hsize=1024,
...    dropout_p=0.0)
>>> op = MATEmbedding()(node)
>>> output = layer(op, op, op, mask, adj, dist)

__init__(dist_kernel: str = 'softmax', lambda_attention: float = 0.33, lambda_distance: float = 0.33, h: int = 16, hsize: int = 1024, dropout_p: float = 0.0, output_bias: bool = True)[source]¶: Initialize a multi-headed attention layer. :param dist_kernel: Kernel activation to be used. Can be either ‘softmax’ for softmax or ‘exp’ for exponential. :type dist_kernel: str :param lambda_attention: Constant to be multiplied with the attention matrix. :type lambda_attention: float :param lambda_distance: Constant to be multiplied with the distance matrix. :type lambda_distance: float :param h: Number of attention heads. :type h: int :param hsize: Size of dense layer. :type hsize: int :param dropout_p: Dropout probability. :type dropout_p: float :param output_bias: If True, dense layers will use bias vectors. :type output_bias: bool

forward(query: Tensor, key: Tensor, value: Tensor, mask: Tensor, adj_matrix: Tensor, distance_matrix: Tensor, dropout_p: float = 0.0, eps: float = 1e-06, inf: float = 1000000000000.0) → Tensor[source]¶: Output computation for the MultiHeadedAttention layer. :param query: Standard query parameter for attention. :type query: torch.Tensor :param key: Standard key parameter for attention. :type key: torch.Tensor :param value: Standard value parameter for attention. :type value: torch.Tensor :param mask: Masks out padding values so that they are not taken into account when computing the attention score. :type mask: torch.Tensor :param adj_matrix: Adjacency matrix of the input molecule, returned from dc.feat.MATFeaturizer() :type adj_matrix: torch.Tensor :param dist_matrix: Distance matrix of the input molecule, returned from dc.feat.MATFeaturizer() :type dist_matrix: torch.Tensor :param dropout_p: Dropout probability. :type dropout_p: float :param eps: Epsilon value :type eps: float :param inf: Value of infinity to be used. :type inf: float

class SublayerConnection(size: int, dropout_p: float = 0.0)[source]¶

SublayerConnection layer based on the paper Attention Is All You Need.

The SublayerConnection normalizes and adds dropout to output tensor of an arbitary layer. It further adds a residual layer connection between the input of the arbitary layer and the dropout-adjusted layer output.

Examples

>>> from deepchem.models.torch_models.layers import SublayerConnection
>>> scale = 0.35
>>> layer = SublayerConnection(2, 0.)
>>> input_ar = torch.tensor([[1., 2.], [5., 6.]])
>>> output = layer(input_ar, input_ar)

__init__(size: int, dropout_p: float = 0.0)[source]¶

Initialize a SublayerConnection Layer.

Parameters:

size (int) – Size of layer.
dropout_p (float) – Dropout probability.

forward(x: Tensor, output: Tensor) → Tensor[source]¶

Output computation for the SublayerConnection layer.

Takes an input tensor x, then adds the dropout-adjusted sublayer output for normalized x to it. This is done to add a residual connection followed by LayerNorm.

Parameters:

x (torch.Tensor) – Input tensor.
output (torch.Tensor) – Layer whose normalized output will be added to x.

class PositionwiseFeedForward(d_input: int = 1024, d_hidden: int = 1024, d_output: int = 1024, activation: str = 'leakyrelu', n_layers: int = 1, dropout_p: float = 0.0, dropout_at_input_no_act: bool = False)[source]¶

PositionwiseFeedForward is a layer used to define the position-wise feed-forward (FFN) algorithm for the Molecular Attention Transformer [1]_

Each layer in the MAT encoder contains a fully connected feed-forward network which applies two linear transformations and the given activation function. This is done in addition to the SublayerConnection module.

Note: This modified version of PositionwiseFeedForward class contains dropout_at_input_no_act condition to facilitate its use in defining: the feed-forward (FFN) algorithm for the Directed Message Passing Neural Network (D-MPNN) [2]_

References

Examples

>>> from deepchem.models.torch_models.layers import PositionwiseFeedForward
>>> feed_fwd_layer = PositionwiseFeedForward(d_input = 2, d_hidden = 2, d_output = 2, activation = 'relu', n_layers = 1, dropout_p = 0.1)
>>> input_tensor = torch.tensor([[1., 2.], [5., 6.]])
>>> output_tensor = feed_fwd_layer(input_tensor)

__init__(d_input: int = 1024, d_hidden: int = 1024, d_output: int = 1024, activation: str = 'leakyrelu', n_layers: int = 1, dropout_p: float = 0.0, dropout_at_input_no_act: bool = False)[source]¶

Initialize a PositionwiseFeedForward layer.

Parameters:

d_input (int) – Size of input layer.
d_hidden (int (same as d_input if d_output = 0)) – Size of hidden layer.
d_output (int (same as d_input if d_output = 0)) – Size of output layer.
activation (str) – Activation function to be used. Can choose between ‘relu’ for ReLU, ‘leakyrelu’ for LeakyReLU, ‘prelu’ for PReLU, ‘tanh’ for TanH, ‘selu’ for SELU, ‘elu’ for ELU and ‘linear’ for linear activation.
n_layers (int) – Number of layers.
dropout_p (float) – Dropout probability.
dropout_at_input_no_act (bool) – If true, dropout is applied on the input tensor. For single layer, it is not passed to an activation function.

forward(x: Tensor) → Tensor[source]¶

Output Computation for the PositionwiseFeedForward layer.

Parameters:: x (torch.Tensor) – Input tensor.

class MATEmbedding(d_input: int = 36, d_output: int = 1024, dropout_p: float = 0.0)[source]¶

Embedding layer to create embedding for inputs.

In an embedding layer, input is taken and converted to a vector representation for each input. In the MATEmbedding layer, an input tensor is processed through a dropout-adjusted linear layer and the resultant vector is returned.

References

Examples

>>> from deepchem.models.torch_models.layers import MATEmbedding
>>> layer = MATEmbedding(d_input = 3, d_output = 3, dropout_p = 0.2)
>>> input_tensor = torch.tensor([1., 2., 3.])
>>> output = layer(input_tensor)

__init__(d_input: int = 36, d_output: int = 1024, dropout_p: float = 0.0)[source]¶

Initialize a MATEmbedding layer.

Parameters:

d_input (int) – Size of input layer.
d_output (int) – Size of output layer.
dropout_p (float) – Dropout probability for layer.

forward(x: Tensor) → Tensor[source]¶

Computation for the MATEmbedding layer.

Parameters:: x (torch.Tensor) – Input tensor to be converted into a vector.

class MATGenerator(hsize: int = 1024, aggregation_type: str = 'mean', d_output: int = 1, n_layers: int = 1, dropout_p: float = 0.0, attn_hidden: int = 128, attn_out: int = 4)[source]¶

MATGenerator defines the linear and softmax generator step for the Molecular Attention Transformer [1]_.

In the MATGenerator, a Generator is defined which performs the Linear + Softmax generation step. Depending on the type of aggregation selected, the attention output layer performs different operations.

References

Examples

>>> from deepchem.models.torch_models.layers import MATGenerator
>>> layer = MATGenerator(hsize = 3, aggregation_type = 'mean', d_output = 1, n_layers = 1, dropout_p = 0.3, attn_hidden = 128, attn_out = 4)
>>> input_tensor = torch.tensor([1., 2., 3.])
>>> mask = torch.tensor([1., 1., 1.])
>>> output = layer(input_tensor, mask)

__init__(hsize: int = 1024, aggregation_type: str = 'mean', d_output: int = 1, n_layers: int = 1, dropout_p: float = 0.0, attn_hidden: int = 128, attn_out: int = 4)[source]¶

Initialize a MATGenerator.

Parameters:

hsize (int) – Size of input layer.
aggregation_type (str) – Type of aggregation to be used. Can be ‘grover’, ‘mean’ or ‘contextual’.
d_output (int) – Size of output layer.
n_layers (int) – Number of layers in MATGenerator.
dropout_p (float) – Dropout probability for layer.
attn_hidden (int) – Size of hidden attention layer.
attn_out (int) – Size of output attention layer.

forward(x: Tensor, mask: Tensor) → Tensor[source]¶

Computation for the MATGenerator layer.

Parameters:

x (torch.Tensor) – Input tensor.
mask (torch.Tensor) – Mask for padding so that padded values do not get included in attention score calculation.

cosine_dist(x, y)[source]¶

Computes the inner product (cosine similarity) between two tensors.

This assumes that the two input tensors contain rows of vectors where each column represents a different feature. The output tensor will have elements that represent the inner product between pairs of normalized vectors in the rows of x and y. The two tensors need to have the same number of columns, because one cannot take the dot product between vectors of different lengths. For example, in sentence similarity and sentence classification tasks, the number of columns is the embedding size. In these tasks, the rows of the input tensors would be different test vectors or sentences. The input tensors themselves could be different batches. Using vectors or tensors of all 0s should be avoided.

The vectors in the input tensors are first l2-normalized such that each vector

has length or magnitude of 1. The inner product (dot product) is then taken

between corresponding pairs of row vectors in the input tensors and returned.

Examples

The cosine similarity between two equivalent vectors will be 1. The cosine similarity between two equivalent tensors (tensors where all the elements are the same) will be a tensor of 1s. In this scenario, if the input tensors x and y are each of shape (n,p), where each element in x and y is the same, then the output tensor would be a tensor of shape (n,n) with 1 in every entry.

>>> import numpy as np
>>> import tensorflow as tf
>>> import deepchem.models.layers as layers
>>> x = tf.ones((6, 4), dtype=tf.dtypes.float32, name=None)
>>> y_same = tf.ones((6, 4), dtype=tf.dtypes.float32, name=None)
>>> cos_sim_same = layers.cosine_dist(x,y_same)

x and y_same are the same tensor (equivalent at every element, in this case 1). As such, the pairwise inner product of the rows in x and y will always be 1. The output tensor will be of shape (6,6).

>>> diff = cos_sim_same - tf.ones((6, 6), dtype=tf.dtypes.float32, name=None)
>>> np.allclose(0.0, tf.reduce_sum(diff).numpy(), atol=1e-05)
True
>>> cos_sim_same.shape
TensorShape([6, 6])

The cosine similarity between two orthogonal vectors will be 0 (by definition). If every row in x is orthogonal to every row in y, then the output will be a tensor of 0s. In the following example, each row in the tensor x1 is orthogonal to each row in x2 because they are halves of an identity matrix.

>>> identity_tensor = tf.eye(512, dtype=tf.dtypes.float32)
>>> x1 = identity_tensor[0:256,:]
>>> x2 = identity_tensor[256:512,:]
>>> cos_sim_orth = layers.cosine_dist(x1,x2)

Each row in x1 is orthogonal to each row in x2. As such, the pairwise inner product of the rows in x1`and `x2 will always be 0. Furthermore, because the shape of the input tensors are both of shape (256,512), the output tensor will be of shape (256,256).

>>> np.allclose(0.0, tf.reduce_sum(cos_sim_orth).numpy(), atol=1e-05)
True
>>> cos_sim_orth.shape
TensorShape([256, 256])

Parameters:

x (tf.Tensor) – Input Tensor of shape (n, p). The shape of this input tensor should be n rows by p columns. Note that n need not equal m (the number of rows in y).
y (tf.Tensor) – Input Tensor of shape (m, p) The shape of this input tensor should be m rows by p columns. Note that m need not equal n (the number of rows in x).

Returns:

Returns a tensor of shape (n, m), that is, n rows by m columns. Each i,j-th entry of this output tensor is the inner product between the l2-normalized i-th row of the input tensor x and the the l2-normalized j-th row of the output tensor y.

Return type:

tf.Tensor

class GraphNetwork(n_node_features: int = 32, n_edge_features: int = 32, n_global_features: int = 32, is_undirected: bool = True, residual_connection: bool = True)[source]¶

Graph Networks

A Graph Network [1]_ takes a graph as input and returns an updated graph as output. The output graph has same structure as input graph but it has updated node features, edge features and global state features.

Parameters:

n_node_features (int) – Number of features in a node
n_edge_features (int) – Number of features in a edge
n_global_features (int) – Number of global features
is_undirected (bool, optional (default True)) – Directed or undirected graph
residual_connection (bool, optional (default True)) – If True, the layer uses a residual connection during training

Example

>>> import torch
>>> from deepchem.models.torch_models.layers import GraphNetwork as GN
>>> n_nodes, n_node_features = 5, 10
>>> n_edges, n_edge_features = 5, 2
>>> n_global_features = 4
>>> node_features = torch.randn(n_nodes, n_node_features)
>>> edge_features = torch.randn(n_edges, n_edge_features)
>>> edge_index = torch.tensor([[0, 1, 2, 3, 4], [1, 2, 3, 4, 0]]).long()
>>> global_features = torch.randn(1, n_global_features)
>>> gn = GN(n_node_features=n_node_features, n_edge_features=n_edge_features, n_global_features=n_global_features)
>>> node_features, edge_features, global_features = gn(node_features, edge_index, edge_features, global_features)

References

__init__(n_node_features: int = 32, n_edge_features: int = 32, n_global_features: int = 32, is_undirected: bool = True, residual_connection: bool = True)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(node_features: Tensor, edge_index: Tensor, edge_features: Tensor, global_features: Tensor, batch: Tensor | None = None) → Tuple[Tensor, Tensor, Tensor][source]¶

Output computation for a GraphNetwork

Parameters:

node_features (torch.Tensor) – Input node features of shape $(|\mathcal{V}|, F_n)$
edge_index (torch.Tensor) – Edge indexes of shape $(2, |\mathcal{E}|)$
edge_features (torch.Tensor) – Edge features of the graph, shape: $(|\mathcal{E}|, F_e)$
global_features (torch.Tensor) – Global features of the graph, shape: $(F_g, 1)$ where, $|\mathcal{V}|$ and $|\mathcal{E}|$ denotes the number of nodes and edges in the graph, $F_n$, $F_e$, $F_g$ denotes the number of node features, edge features and global state features respectively.
batch (torch.LongTensor (optional, default: None)) – A vector that maps each node to its respective graph identifier. The attribute is used only when more than one graph are batched together during a single forward pass.

class Affine(dim: int)[source]¶

Class which performs the Affine transformation.

This transformation is based on the affinity of the base distribution with the target distribution. A geometric transformation is applied where the parameters performs changes on the scale and shift of a function (inputs).

Normalizing Flow transformations must be bijective in order to compute the logarithm of jacobian’s determinant. For this reason, transformations must perform a forward and inverse pass.

Example

>>> import deepchem as dc
>>> from deepchem.models.torch_models.layers import Affine
>>> import torch
>>> from torch.distributions import MultivariateNormal
>>> # initialize the transformation layer's parameters
>>> dim = 2
>>> samples = 96
>>> transforms = Affine(dim)
>>> # forward pass based on a given distribution
>>> distribution = MultivariateNormal(torch.zeros(dim), torch.eye(dim))
>>> input = distribution.sample(torch.Size((samples, dim)))
>>> len(transforms.forward(input))
2
>>> # inverse pass based on a distribution
>>> len(transforms.inverse(input))
2

__init__(dim: int) → None[source]¶

Create a Affine transform layer.

Parameters:: dim (int) – Value of the Nth dimension of the dataset.

forward(x: Tensor) → Tuple[Tensor, Tensor][source]¶

Performs a transformation between two different distributions. This particular transformation represents the following function:

\[y = x * exp(a) + b\]

where a is scale parameter and b performs a shift. This class also returns the logarithm of the jacobians determinant which is useful when invert a transformation and compute the probability of the transformation.

Parameters:

x (torch.Tensor) – Tensor sample with the initial distribution data which will pass into the normalizing flow algorithm.

Returns:

y (torch.Tensor) – Transformed tensor according to Affine layer with the shape of ‘x’.
log_det_jacobian (torch.Tensor) – Tensor which represents the info about the deviation of the initial and target distribution.

inverse(y: Tensor) → Tuple[Tensor, Tensor][source]¶

Performs a transformation between two different distributions. This transformation represents the bacward pass of the function mention before. Its mathematical representation is x = (y - b) / exp(a) , where “a” is scale parameter and “b” performs a shift. This class also returns the logarithm of the jacobians determinant which is useful when invert a transformation and compute the probability of the transformation.

Parameters:

y (torch.Tensor) – Tensor sample with transformed distribution data which will be used in the normalizing algorithm inverse pass.

Returns:

x (torch.Tensor) – Transformed tensor according to Affine layer with the shape of ‘y’.
inverse_log_det_jacobian (torch.Tensor) – Tensor which represents the information of the deviation of the initial and target distribution.

class RealNVPLayer(mask: Tensor, hidden_size: int)[source]¶

Real NVP Transformation Layer

This class class is a constructor transformation layer used on a NormalizingFLow model. The Real Non-Preserving-Volumen (Real NVP) is a type of normalizing flow layer which gives advantages over this mainly because an ease to compute the inverse pass [realnvp1], this is to learn a target distribution.

Example

>>> import torch
>>> import torch.nn as nn
>>> import torch.nn.functional as F
>>> from torch.distributions import MultivariateNormal
>>> from deepchem.models.torch_models.layers import RealNVPLayer
>>> dim = 2
>>> samples = 96
>>> data = MultivariateNormal(torch.zeros(dim), torch.eye(dim))
>>> tensor = data.sample(torch.Size((samples, dim)))

>>> layers = 4
>>> hidden_size = 16
>>> masks = F.one_hot(torch.tensor([i % 2 for i in range(layers)])).float()
>>> layers = nn.ModuleList([RealNVPLayer(mask, hidden_size) for mask in masks])

>>> for layer in layers:
...   _, inverse_log_det_jacobian = layer.inverse(tensor)
...   inverse_log_det_jacobian = inverse_log_det_jacobian.detach().numpy()
>>> len(inverse_log_det_jacobian)
96

References

[realnvp1]

Stimper, V., Schölkopf, B., & Hernández-Lobato, J. M. (2021). Resampling Base

Distributions of Normalizing Flows. (2017). Retrieved from http://arxiv.org/abs/2110.15828

__init__(mask: Tensor, hidden_size: int) → None[source]¶

Parameters:

mask (torch.Tensor) – Tensor with zeros and ones and its size depende on the number of layers and dimenssions the user request.
hidden_size (int) – The size of the outputs and inputs used on the internal nodes of the transformation layer.

forward(x: Sequence) → Tuple[Tensor, Tensor][source]¶

Forward pass.

This particular transformation is represented by the following function: y = x + (1 - x) * exp( s(x)) + t(x), where t and s needs an activation function. This class also returns the logarithm of the jacobians determinant which is useful when invert a transformation and compute the probability of the transformation.

Parameters:

x (Sequence) – Tensor sample with the initial distribution data which will pass into the normalizing algorithm

Returns:

y (torch.Tensor) – Transformed tensor according to Real NVP layer with the shape of ‘x’.
log_det_jacobian (torch.Tensor) – Tensor which represents the info about the deviation of the initial and target distribution.

inverse(y: Sequence) → Tuple[Tensor, Tensor][source]¶

Inverse pass

This class performs the inverse of the previous method (formward). Also, this metehod returns the logarithm of the jacobians determinant which is useful to compute the learneable features of target distribution.

Parameters:

y (Sequence) – Tensor sample with transformed distribution data which will be used in the normalizing algorithm inverse pass.

Returns:

x (torch.Tensor) – Transformed tensor according to Real NVP layer with the shape of ‘y’.
inverse_log_det_jacobian (torch.Tensor) – Tensor which represents the information of the deviation of the initial and target distribution.

class DMPNNEncoderLayer(use_default_fdim: bool = True, atom_fdim: int = 133, bond_fdim: int = 14, d_hidden: int = 300, depth: int = 3, bias: bool = False, activation: str = 'relu', dropout_p: float = 0.0, aggregation: str = 'mean', aggregation_norm: int | float = 100)[source]¶

Encoder layer for use in the Directed Message Passing Neural Network (D-MPNN) [1]_.

The role of the DMPNNEncoderLayer class is to generate molecule encodings in following steps:

Message passing phase
Get new atom hidden states and readout phase
Concatenate the global features

Let the diagram given below represent a molecule containing 5 atoms (nodes) and 4 bonds (edges):-

1 — 5
|
2 — 4
|
3

Let the bonds from atoms 1->2 (B[12]) and 2->1 (B[21]) be considered as 2 different bonds. Hence, by considering the same for all atoms, the total number of bonds = 8.

Let:

atom features : a1, a2, a3, a4, a5
hidden states of atoms : h1, h2, h3, h4, h5
bond features bonds : b12, b21, b23, b32, b24, b42, b15, b51
initial hidden states of bonds : (0)h12, (0)h21, (0)h23, (0)h32, (0)h24, (0)h42, (0)h15, (0)h51

The hidden state of every bond is a function of the concatenated feature vector which contains concatenation of the features of initial atom of the bond and bond features.

Example: (0)h21 = func1(concat(a2, b21))

Note

Here func1 is self.W_i

The Message passing phase

The goal of the message-passing phase is to generate hidden states of all the atoms in the molecule.

The hidden state of an atom is a function of concatenation of atom features and messages (at T depth).

A message is a sum of hidden states of bonds coming to the atom (at T depth).

Note

Depth refers to the number of iterations in the message passing phase (here, T iterations). After each iteration, the hidden states of the bonds are updated.

Example: h1 = func3(concat(a1, m1))

Note

Here func3 is self.W_o.

m1 refers to the message coming to the atom.

m1 = (T-1)h21 + (T-1)h51 (hidden state of bond 2->1 + hidden state of bond 5->1) (at T depth)

for, depth T = 2:

the hidden states of the bonds @ 1st iteration will be => (0)h21, (0)h51
the hidden states of the bonds @ 2nd iteration will be => (1)h21, (1)h51

The hidden states of the bonds in 1st iteration are already know. For hidden states of the bonds in 2nd iteration, we follow the criterion that:

hidden state of the bond is a function of initial hidden state of bond

and messages coming to that bond in that iteration

Example: (1)h21 = func2( (0)h21 , (1)m21 )

Note

Here func2 is self.W_h.

(1)m21 refers to the messages coming to that bond 2->1 in that 2nd iteration.

Messages coming to a bond in an iteration is a sum of hidden states of bonds (from previous iteration) coming to this bond.

Example: (1)m21 = (0)h32 + (0)h42

2 <— 3
^
|
4

Computing the messages

                     B0      B1      B2      B3      B4      B5      B6      B7      B8
f_ini_atoms_bonds = [(0)h12, (0)h21, (0)h23, (0)h32, (0)h24, (0)h42, (0)h15, (0)h51, h(-1)]

Note

h(-1) is an empty array of the same size as other hidden states of bond states.

            B0      B1      B2      B3      B4      B5      B6      B7       B8
mapping = [ [-1,B7] [B3,B5] [B0,B5] [-1,-1] [B0,B3] [-1,-1] [B1,-1] [-1,-1]  [-1,-1] ]

Later, the encoder will map the concatenated features from the f_ini_atoms_bonds to mapping in each iteration upto Tth iteration.

Next the encoder will sum-up the concat features within same bond index.

                (1)m12           (1)m21           (1)m23              (1)m32          (1)m24           (1)m42           (1)m15          (1)m51            m(-1)
message = [ [h(-1) + (0)h51] [(0)h32 + (0)h42] [(0)h12 + (0)h42] [h(-1) + h(-1)] [(0)h12 + (0)h32] [h(-1) + h(-1)] [(0)h21 + h(-1)] [h(-1) + h(-1)]  [h(-1) + h(-1)] ]

Hence, this is how encoder can get messages for message-passing steps.

Get new atom hidden states and readout phase

Hence now for h1:

h1 = func3(
            concat(
                 a1,
                 [
                    func2( (0)h21 , (0)h32 + (0)h42 ) +
                    func2( (0)h51 , 0               )
                 ]
                )
         )

Similarly, h2, h3, h4 and h5 are calculated.

Next, all atom hidden states are concatenated to make a feature vector of the molecule:

mol_encodings = [[h1, h2, h3, h4, h5]]

Concatenate the global features

Let, global_features = [[gf1, gf2, gf3]] This array contains molecule level features. In case of this example, it contains 3 global features.

Hence after concatenation,

mol_encodings = [[h1, h2, h3, h4, h5, gf1, gf2, gf3]] (Final output of the encoder)

References

Examples

>>> from rdkit import Chem
>>> import torch
>>> import deepchem as dc
>>> input_smile = "CC"
>>> feat = dc.feat.DMPNNFeaturizer(features_generators=['morgan'])
>>> graph = feat.featurize(input_smile)
>>> from deepchem.models.torch_models.dmpnn import _MapperDMPNN
>>> mapper = _MapperDMPNN(graph[0])
>>> atom_features, f_ini_atoms_bonds, atom_to_incoming_bonds, mapping, global_features = mapper.values
>>> atom_features = torch.from_numpy(atom_features).float()
>>> f_ini_atoms_bonds = torch.from_numpy(f_ini_atoms_bonds).float()
>>> atom_to_incoming_bonds = torch.from_numpy(atom_to_incoming_bonds)
>>> mapping = torch.from_numpy(mapping)
>>> global_features = torch.from_numpy(global_features).float()
>>> molecules_unbatch_key = len(atom_features)
>>> layer = DMPNNEncoderLayer(d_hidden=2)
>>> output = layer(atom_features, f_ini_atoms_bonds, atom_to_incoming_bonds, mapping, global_features, molecules_unbatch_key)

__init__(use_default_fdim: bool = True, atom_fdim: int = 133, bond_fdim: int = 14, d_hidden: int = 300, depth: int = 3, bias: bool = False, activation: str = 'relu', dropout_p: float = 0.0, aggregation: str = 'mean', aggregation_norm: int | float = 100)[source]¶

Initialize a DMPNNEncoderLayer layer.

Parameters:

use_default_fdim (bool) – If True, self.atom_fdim and self.bond_fdim are initialized using values from the GraphConvConstants class. If False, self.atom_fdim and self.bond_fdim are initialized from the values provided.
atom_fdim (int) – Dimension of atom feature vector.
bond_fdim (int) – Dimension of bond feature vector.
d_hidden (int) – Size of hidden layer in the encoder layer.
depth (int) – No of message passing steps.
bias (bool) – If True, dense layers will use bias vectors.
activation (str) – Activation function to be used in the encoder layer. Can choose between ‘relu’ for ReLU, ‘leakyrelu’ for LeakyReLU, ‘prelu’ for PReLU, ‘tanh’ for TanH, ‘selu’ for SELU, and ‘elu’ for ELU.
dropout_p (float) – Dropout probability for the encoder layer.
aggregation (str) – Aggregation type to be used in the encoder layer. Can choose between ‘mean’, ‘sum’, and ‘norm’.
aggregation_norm (Union[int, float]) – Value required if aggregation type is ‘norm’.

forward(atom_features: Tensor, f_ini_atoms_bonds: Tensor, atom_to_incoming_bonds: Tensor, mapping: Tensor, global_features: Tensor, molecules_unbatch_key: List) → Tensor[source]¶

Output computation for the DMPNNEncoderLayer.

Steps:

Get original bond hidden states from concatenation of initial atom and bond features. (input)
Get initial messages hidden states. (message)
Execute message passing step for self.depth - 1 iterations.
Get atom hidden states using atom features and message hidden states.
Get molecule encodings.
Concatenate global molecular features and molecule encodings.

Parameters:

atom_features (torch.Tensor) – Tensor containing atoms features.
f_ini_atoms_bonds (torch.Tensor) – Tensor containing concatenated feature vector which contains concatenation of initial atom and bond features.
atom_to_incoming_bonds (torch.Tensor) – Tensor containing mapping from atom index to list of indicies of incoming bonds.
mapping (torch.Tensor) – Tensor containing the mapping that maps bond index to ‘array of indices of the bonds’ incoming at the initial atom of the bond (excluding the reverse bonds).
global_features (torch.Tensor) – Tensor containing molecule features.
molecules_unbatch_key (List) – List containing number of atoms in various molecules of a batch

Returns:

output – Tensor containing the encodings of the molecules.

Return type:

torch.Tensor

class InfoGraphEncoder(num_features, edge_features, embedding_dim)[source]¶

The encoder for the InfoGraph model. It is a message passing graph convolutional network that produces encoded representations for molecular graph inputs.

Parameters:

num_features (int) – Number of node features for each input
edge_features (int) – Number of edge features for each input
embedding_dim (int) – Dimension of the embedding

Example

>>> import numpy as np
>>> from deepchem.models.torch_models.infograph import InfoGraphEncoder
>>> from deepchem.feat.graph_data import GraphData
>>> encoder = InfoGraphEncoder(num_features=25, edge_features=10, embedding_dim=32)
>>> node_features = np.random.randn(10, 25)
>>> edge_index = np.array([[0, 1, 2], [1, 2, 3]])
>>> edge_features = np.random.randn(3, 10)
>>> graph_index = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>>> data = GraphData(node_features=node_features, edge_index=edge_index, edge_features=edge_features, graph_index=graph_index).numpy_to_torch()
>>> embedding, feature_map = encoder(data)
>>> print(embedding.shape)
torch.Size([1, 64])

__init__(num_features, edge_features, embedding_dim)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(data)[source]¶

Encode input graphs into an embedding and feature map.

Parameters:

data (Union[BatchGraphData, GraphData]) – Contains information about graphs.

Returns:

torch.Tensor – Encoded tensor of input data.
torch.Tensor – Feature map tensor of input data.

class GINEncoder(num_features: int, embedding_dim: int, num_gc_layers: int = 5)[source]¶

Graph Information Network (GIN) encoder. This is a graph convolutional network that produces encoded representations for molecular graph inputs.

Parameters:

num_features (int) – The number of node features
embedding_dim (int) – The dimension of the output embedding
num_gc_layers (int, optional (default 5)) – The number of graph convolutional layers to use

Example

>>> import numpy as np
>>> from deepchem.models.torch_models.infograph import GINEncoder
>>> from deepchem.feat.graph_data import GraphData
>>> encoder = GINEncoder(num_features=25, embedding_dim=32)
>>> node_features = np.random.randn(10, 25)
>>> edge_index = np.array([[0, 1, 2], [1, 2, 3]])
>>> edge_features = np.random.randn(3, 10)
>>> graph_index = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
>>> data = GraphData(node_features=node_features, edge_index=edge_index, edge_features=edge_features, graph_index=graph_index).numpy_to_torch()
>>> embedding, intermediate_embeddings = encoder(data)
>>> print(embedding.shape)
torch.Size([1, 30])

References

__init__(num_features: int, embedding_dim: int, num_gc_layers: int = 5)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(data)[source]¶

Encodes the input graph data.

Parameters:: data (BatchGraphData) – The batched input graph data.
Returns:: A tuple containing the encoded representation and intermediate embeddings.
Return type:: Tuple[torch.Tensor, torch.Tensor]

class SetGather(M: int, batch_size: int, n_hidden: int = 100, init='orthogonal', **kwargs)[source]¶

set2set gather layer for graph-based model

Models using this layer must set pad_batches=True

Torch Equivalent of Keras SetGather layer

Parameters:

M (int) – Number of LSTM steps
batch_size (int) – Number of samples in a batch(all batches must have same size)
n_hidden (int, optional) – number of hidden units in the passing phase

Examples

>>> import deepchem as dc
>>> import numpy as np
>>> from deepchem.models.torch_models import layers
>>> total_n_atoms = 4
>>> n_atom_feat = 4
>>> atom_feat = np.random.rand(total_n_atoms, n_atom_feat)
>>> atom_split = np.array([0, 0, 1, 1], dtype=np.int32)
>>> gather = layers.SetGather(2, 2, n_hidden=4)
>>> output_molecules = gather([atom_feat, atom_split])
>>> print(output_molecules.shape)
torch.Size([2, 8])

__init__(M: int, batch_size: int, n_hidden: int = 100, init='orthogonal', **kwargs)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(inputs: List) → Tensor[source]¶

Perform M steps of set2set gather,

Detailed descriptions in: https://arxiv.org/abs/1511.06391

Parameters:: inputs (List) – This contains two elements. atom_features: np.ndarray atom_split: np.ndarray
Returns:: q_star – Final state of the model after all M steps.
Return type:: torch.Tensor

class GNN(node_type_embedding, chirality_embedding, gconvs, batch_norms, dropout, jump_knowledge, init_emb=False)[source]¶

GNN module for the GNNModular model.

This module is responsible for the graph neural network layers in the GNNModular model.

Parameters:

node_type_embedding (torch.nn.Embedding) – Embedding layer for node types.
chirality_embedding (torch.nn.Embedding) – Embedding layer for chirality tags.
gconvs (torch.nn.ModuleList) – ModuleList of graph convolutional layers.
batch_norms (torch.nn.ModuleList) – ModuleList of batch normalization layers.
dropout (int) – Dropout probability.
jump_knowledge (str) – The type of jump knowledge to use. [1] Must be one of “last”, “sum”, “max”, “concat” or “none”. “last”: Use the node representation from the last GNN layer. “concat”: Concatenate the node representations from all GNN layers. “max”: Take the element-wise maximum of the node representations from all GNN layers. “sum”: Take the element-wise sum of the node representations from all GNN layers.
init_emb (bool) – Whether to initialize the embedding layers with Xavier uniform initialization.

References

Example

>>> from deepchem.models.torch_models.gnn import GNNModular
>>> from deepchem.feat.graph_data import BatchGraphData
>>> from deepchem.feat.molecule_featurizers import SNAPFeaturizer
>>> featurizer = SNAPFeaturizer()
>>> smiles = ["C1=CC=CC=C1", "C1=CC=CC=C1C=O", "C1=CC=CC=C1C(=O)O"]
>>> features = featurizer.featurize(smiles)
>>> modular = GNNModular(emb_dim = 8, task = "edge_pred")
>>> batched_graph = BatchGraphData(features).numpy_to_torch(device=modular.device)
>>> gnnmodel = modular.gnn
>>> print(gnnmodel(batched_graph)[0].shape)
torch.Size([23, 8])

__init__(node_type_embedding, chirality_embedding, gconvs, batch_norms, dropout, jump_knowledge, init_emb=False)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(data: BatchGraphData)[source]¶

Forward pass for the GNN module.

Parameters:: data (BatchGraphData) – Batched graph data.

class GNNHead(pool, head, task, num_tasks, num_classes)[source]¶

Prediction head module for the GNNModular model.

Parameters:

pool (Union[function,torch.nn.Module]) – Pooling function or nn.Module to use
head (torch.nn.Module) – Prediction head to use
task (str) – The type of task. Must be one of “regression”, “classification”.
num_tasks (int) – Number of tasks.
num_classes (int) – Number of classes for classification.

__init__(pool, head, task, num_tasks, num_classes)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(data)[source]¶

Forward pass for the GNN head module.

Parameters:: data (tuple) – A tuple containing the node representations and the input graph data. node_representation is a torch.Tensor created after passing input through the GNN layers. input_batch is the original input BatchGraphData.

class LocalGlobalDiscriminator(hidden_dim)[source]¶

This discriminator module is a linear layer without bias, used to measure the similarity between local node representations (x) and global graph representations (summary).

The goal of the discriminator is to distinguish between positive and negative pairs of local and global representations.

Examples

>>> import torch
>>> from deepchem.models.torch_models.gnn import LocalGlobalDiscriminator
>>> discriminator = LocalGlobalDiscriminator(hidden_dim=64)
>>> x = torch.randn(32, 64)  # Local node representations
>>> summary = torch.randn(32, 64)  # Global graph representations
>>> similarity_scores = discriminator(x, summary)
>>> print(similarity_scores.shape)
torch.Size([32])

__init__(hidden_dim)[source]¶

self.weight is a learnable weight matrix of shape (hidden_dim, hidden_dim).

nn.Parameters are tensors that require gradients and are optimized during the training process.

Parameters:: hidden_dim (int) – The size of the hidden dimension for the weight matrix.

forward(x, summary)[source]¶

Computes the product of summary and self.weight, and then calculates the element-wise product of x and the resulting matrix h. It then sums over the hidden_dim dimension, resulting in a tensor of shape (batch_size,), which represents the similarity scores between the local and global representations.

Parameters:

x (torch.Tensor) – Local node representations of shape (batch_size, hidden_dim).
summary (torch.Tensor) – Global graph representations of shape (batch_size, hidden_dim).

Returns:

A tensor of shape (batch_size,), representing the similarity scores between the local and global representations.

Return type:

torch.Tensor

class AtomEncoder(emb_dim, padding=False)[source]¶

Encodes atom features into embeddings based on the Open Graph Benchmark feature set in conformer_featurizer.

Parameters:

emb_dim (int) – The dimension that the returned embedding will have.
padding (bool, optional (default=False)) – If true then the last index will be used for padding.

Examples

>>> from deepchem.feat.molecule_featurizers.conformer_featurizer import full_atom_feature_dims
>>> atom_encoder = AtomEncoder(emb_dim=32)
>>> num_rows = 10
>>> atom_features = torch.stack([
... torch.randint(low=0, high=dim, size=(num_rows,))
... for dim in full_atom_feature_dims
... ], dim=1)
>>> atom_embeddings = atom_encoder(atom_features)

__init__(emb_dim, padding=False)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

reset_parameters()[source]¶

Reset the parameters of the atom embeddings.

This method resets the weights of the atom embeddings by initializing them with a uniform distribution between -sqrt(3) and sqrt(3).

forward(x)[source]¶

Compute the atom embeddings for the given atom features.

Parameters:: x (torch.Tensor, shape (batch_size, num_atoms, num_features)) – The input atom features tensor.
Returns:: x_embedding – The computed atom embeddings.
Return type:: torch.Tensor, shape (batch_size, num_atoms, emb_dim)

class BondEncoder(emb_dim, padding=False)[source]¶

Encodes bond features into embeddings based on the Open Graph Benchmark feature set in conformer_featurizer.

Parameters:

emb_dim (int) – The dimension that the returned embedding will have.
padding (bool, optional (default=False)) – If true then the last index will be used for padding.

Examples

>>> from deepchem.feat.molecule_featurizers.conformer_featurizer import full_bond_feature_dims
>>> bond_encoder = BondEncoder(emb_dim=32)
>>> num_rows = 10
>>> bond_features = torch.stack([
... torch.randint(low=0, high=dim, size=(num_rows,))
... for dim in full_bond_feature_dims
... ], dim=1)
>>> bond_embeddings = bond_encoder(bond_features)

__init__(emb_dim, padding=False)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(edge_attr)[source]¶

Compute the bond embeddings for the given bond features.

Parameters:: edge_attr (torch.Tensor, shape (batch_size, num_edges, num_features)) – The input bond features tensor.
Returns:: bond_embedding – The computed bond embeddings.
Return type:: torch.Tensor, shape (batch_size, num_edges, emb_dim)

class PNALayer(in_dim: int, out_dim: int, in_dim_edges: int, aggregators: List[str], scalers: List[str], activation: Callable | str = 'relu', dropout: float = 0.0, residual: bool = True, pairwise_distances: bool = False, batch_norm: bool = True, batch_norm_momentum=0.1, avg_d: Dict[str, float] = {'log': 1.0}, posttrans_layers: int = 2, pretrans_layers: int = 1)[source]¶

Principal Neighbourhood Aggregation Layer (PNA) from [1].

Parameters:

in_dim (int) – Input dimension of the node features.
out_dim (int) – Output dimension of the node features.
in_dim_edges (int) – Input dimension of the edge features.
aggregators (List[str]) – List of aggregator functions to use. Options are “mean”, “sum”, “max”, “min”, “std”, “var”, “moment3”, “moment4”, “moment5”.
scalers (List[str]) – List of scaler functions to use. Options are “identity”, “amplification”, “attenuation”.
activation (Union[Callable, str], optional, default="relu") – Activation function to use.
last_activation (Union[Callable, str], optional, default="none") – Last activation function to use.
dropout (float, optional, default=0.0) – Dropout rate.
residual (bool, optional, default=True) – Whether to use residual connections.
pairwise_distances (bool, optional, default=False) – Whether to use pairwise distances.
batch_norm (bool, optional, default=True) – Whether to use batch normalization.
batch_norm_momentum (float, optional, default=0.1) – Momentum for the batch normalization layers.
avg_d (Dict[str, float], optional, default={"log": 1.0}) – Dictionary containing the average degree of the graph.
posttrans_layers (int, optional, default=2) – Number of post-transformation layers.
pretrans_layers (int, optional, default=1) – Number of pre-transformation layers.

References

Examples

>>> import dgl
>>> import numpy as np
>>> import torch
>>> from deepchem.models.torch_models.pna_gnn import PNALayer
>>> in_dim = 32
>>> out_dim = 64
>>> in_dim_edges = 16
>>> aggregators = ["mean", "max"]
>>> scalers = ["identity", "amplification", "attenuation"]
>>> pna_layer = PNALayer(in_dim=in_dim,
...                      out_dim=out_dim,
...                      in_dim_edges=in_dim_edges,
...                      aggregators=aggregators,
...                      scalers=scalers)
>>> num_nodes = 10
>>> num_edges = 20
>>> node_features = torch.randn(num_nodes, in_dim)
>>> edge_features = torch.randn(num_edges, in_dim_edges)
>>> g = dgl.graph((np.random.randint(0, num_nodes, num_edges),
...                np.random.randint(0, num_nodes, num_edges)))
>>> g.ndata['feat'] = node_features
>>> g.edata['feat'] = edge_features
>>> g.ndata['feat'] = pna_layer(g)

__init__(in_dim: int, out_dim: int, in_dim_edges: int, aggregators: List[str], scalers: List[str], activation: Callable | str = 'relu', dropout: float = 0.0, residual: bool = True, pairwise_distances: bool = False, batch_norm: bool = True, batch_norm_momentum=0.1, avg_d: Dict[str, float] = {'log': 1.0}, posttrans_layers: int = 2, pretrans_layers: int = 1)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(g)[source]¶

Forward pass of the PNA layer.

Parameters:: g (dgl.DGLGraph) – Input graph
Returns:: h – Node feature tensor
Return type:: torch.Tensor

message_func(edges) → Dict[str, Tensor][source]¶

The message function to generate messages along the edges.

Parameters:: edges (dgl.EdgeBatch) – Batch of edges.
Returns:: Dictionary containing the edge features.
Return type:: Dict[str, torch.Tensor]

reduce_func(nodes) → Dict[str, Tensor][source]¶

The reduce function to aggregate the messages. Apply the aggregators and scalers, and concatenate the results.

Parameters:: nodes (dgl.NodeBatch) – Batch of nodes.
Returns:: Dictionary containing the aggregated node features.
Return type:: Dict[str, torch.Tensor]

pretrans_edges(edges) → Dict[str, Tensor][source]¶

Return a mapping to the concatenation of the features from the source node, the destination node, and the edge between them (if applicable).

Parameters:: edges (dgl.EdgeBatch) – Batch of edges.
Returns:: Dictionary containing the concatenated features.
Return type:: Dict[str, torch.Tensor]

class PNAGNN(hidden_dim, aggregators: List[str], scalers: List[str], residual: bool = True, pairwise_distances: bool = False, activation: Callable | str = 'relu', batch_norm: bool = True, batch_norm_momentum=0.1, propagation_depth: int = 5, dropout: float = 0.0, posttrans_layers: int = 1, pretrans_layers: int = 1, **kwargs)[source]¶

Principal Neighbourhood Aggregation Graph Neural Network [1]. This defines the message passing layers of the PNA model.

Parameters:

hidden_dim (int) – Dimension of the hidden layers.
aggregators (List[str]) – List of aggregator functions to use.
scalers (List[str]) – List of scaler functions to use. Options are “identity”, “amplification”, “attenuation”.
residual (bool, optional, default=True) – Whether to use residual connections.
pairwise_distances (bool, optional, default=False) – Whether to use pairwise distances.
activation (Union[Callable, str], optional, default="relu") – Activation function to use.
batch_norm (bool, optional, default=True) – Whether to use batch normalization in the layers before the aggregator.
batch_norm_momentum (float, optional, default=0.1) – Momentum for the batch normalization layers.
propagation_depth (int, optional, default=5) – Number of propagation layers.
dropout (float, optional, default=0.0) – Dropout rate.
posttrans_layers (int, optional, default=1) – Number of post-transformation layers.
pretrans_layers (int, optional, default=1) – Number of pre-transformation layers.

References

Examples

>>> import numpy as np
>>> from deepchem.feat.molecule_featurizers.conformer_featurizer import RDKitConformerFeaturizer
>>> from deepchem.feat.graph_data import BatchGraphData
>>> from deepchem.models.torch_models.pna_gnn import PNAGNN
>>> featurizer = RDKitConformerFeaturizer()
>>> smiles = ['C1=CC=NC=C1', 'CC(=O)C', 'C']
>>> featurizer = RDKitConformerFeaturizer()
>>> data = featurizer.featurize(smiles)
>>> features = BatchGraphData(data)
>>> features = features.to_dgl_graph()
>>> model = PNAGNN(hidden_dim=16,
...                aggregators=['mean', 'sum'],
...                scalers=['identity'])
>>> output = model(features)

__init__(hidden_dim, aggregators: List[str], scalers: List[str], residual: bool = True, pairwise_distances: bool = False, activation: Callable | str = 'relu', batch_norm: bool = True, batch_norm_momentum=0.1, propagation_depth: int = 5, dropout: float = 0.0, posttrans_layers: int = 1, pretrans_layers: int = 1, **kwargs)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(input_graph: DGLGraph) → DGLGraph[source]¶

Forward pass of the PNAGNN model.

Parameters:: input_graph (dgl.DGLGraph) – Input graph with node and edge features.
Returns:: graph – Output graph with updated node features after applying the message passing layers.
Return type:: dgl.DGLGraph

class PNA(hidden_dim: int, target_dim: int, task: str, aggregators: List[str] = ['mean'], scalers: List[str] = ['identity'], readout_aggregators: List[str] = ['mean'], readout_hidden_dim: int = 1, readout_layers: int = 2, residual: bool = True, pairwise_distances: bool = False, activation: Callable | str = 'relu', batch_norm: bool = True, batch_norm_momentum: float = 0.1, propagation_depth: int = 5, dropout: float = 0.0, posttrans_layers: int = 1, pretrans_layers: int = 1, n_tasks: int = 1, n_classes: int = 2, **kwargs)[source]¶

Message passing neural network for graph representation learning [1]_.

Parameters:

hidden_dim (int) – Hidden dimension size.
target_dim (int) – Dimensionality of the output, for example for binary classification target_dim = 1.
aggregators (List[str]) – Type of message passing functions. Options are ‘mean’,’sum’,’max’,’min’,’std’,’var’,’moment3’,’moment4’,’moment5’.
scalers (List[str]) – Type of normalization layers in the message passing network. Options are ‘identity’,’amplification’,’attenuation’.
readout_aggregators (List[str]) – Type of aggregators in the readout network.
readout_hidden_dim (int, default None) – The dimension of the hidden layer in the readout network. If not provided, the readout has the same dimensionality of the final layer of the PNA layer, which is the hidden dimension size.
readout_layers (int, default 1) – The number of linear layers in the readout network.
residual (bool, default True) – Whether to use residual connections.
pairwise_distances (bool, default False) – Whether to use pairwise distances.
activation (Union[Callable, str]) – Activation function to use.
batch_norm (bool, default True) – Whether to use batch normalization in the layers before the aggregator..
batch_norm_momentum (float, default 0.1) – Momentum for the batch normalization layers.
propagation_depth (int, default) – Number of propagation layers.
dropout (float, default 0.0) – Dropout probability in the message passing layers.
posttrans_layers (int, default 1) – Number of post-transformation layers.
pretrans_layers (int, default 1) – Number of pre-transformation layers.

References

Examples

>>> import numpy as np
>>> from deepchem.feat.graph_data import BatchGraphData
>>> from deepchem.models.torch_models.pna_gnn import PNA
>>> from deepchem.feat.molecule_featurizers.conformer_featurizer import RDKitConformerFeaturizer
>>> smiles = ["C1=CC=CN=C1", "C1CCC1"]
>>> featurizer = RDKitConformerFeaturizer()
>>> data = featurizer.featurize(smiles)
>>> features = BatchGraphData(data)
>>> features = features.to_dgl_graph()
>>> target_dim = 1
>>> model = PNA(hidden_dim=16, target_dim=target_dim, task='regression')
>>> output = model(features)
>>> print(output.shape)
torch.Size([1, 1])

__init__(hidden_dim: int, target_dim: int, task: str, aggregators: List[str] = ['mean'], scalers: List[str] = ['identity'], readout_aggregators: List[str] = ['mean'], readout_hidden_dim: int = 1, readout_layers: int = 2, residual: bool = True, pairwise_distances: bool = False, activation: Callable | str = 'relu', batch_norm: bool = True, batch_norm_momentum: float = 0.1, propagation_depth: int = 5, dropout: float = 0.0, posttrans_layers: int = 1, pretrans_layers: int = 1, n_tasks: int = 1, n_classes: int = 2, **kwargs)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(graph: DGLGraph)[source]¶

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class Net3DLayer(edge_dim: int, hidden_dim: int, reduce_func: str = 'sum', batch_norm: bool = False, batch_norm_momentum: float = 0.1, dropout: float = 0.0, message_net_layers: int = 2, update_net_layers: int = 2)[source]¶

Net3DLayer is a single layer of a 3D graph neural network based on the 3D Infomax architecture [1].

This class expects a DGL graph with node features stored under the name ‘feat’ and edge features stored under the name ‘d’ (representing 3D distances). The edge features are updated by the message network and the node features are updated by the update network.

Parameters:

edge_dim (int) – The dimension of the edge features.
hidden_dim (int) – The dimension of the hidden layers.
reduce_func (str) – The reduce function to use for aggregating messages. Can be either ‘sum’ or ‘mean’.
batch_norm (bool, optional (default=False)) – Whether to use batch normalization.
batch_norm_momentum (float, optional (default=0.1)) – The momentum for the batch normalization layers.
dropout (float, optional (default=0.0)) – The dropout rate for the layers.
mid_activation (str, optional (default='SiLU')) – The activation function to use in the network.
message_net_layers (int, optional (default=2)) – The number of message network layers.
update_net_layers (int, optional (default=2)) – The number of update network layers.

References

Examples

>>> net3d_layer = Net3DLayer(edge_dim=3, hidden_dim=3)
>>> graph = dgl.graph(([0, 1], [1, 2]))
>>> graph.ndata['feat'] = torch.tensor([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])
>>> graph.edata['d'] = torch.tensor([[0.5, 0.6, 0.7], [0.8, 0.9, 1.0]])
>>> output = net3d_layer(graph)

__init__(edge_dim: int, hidden_dim: int, reduce_func: str = 'sum', batch_norm: bool = False, batch_norm_momentum: float = 0.1, dropout: float = 0.0, message_net_layers: int = 2, update_net_layers: int = 2)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(input_graph: DGLGraph)[source]¶

Perform a forward pass on the given graph.

Parameters:: input_graph (dgl.DGLGraph) – The graph to perform the forward pass on.
Returns:: The updated graph after the forward pass.
Return type:: dgl.DGLGraph

message_function(edges)[source]¶

Computes the message and edge weight for a given set of edges.

Parameters:: edges (dgl.EdgeBatch) – A dgl.EdgeBatch object containing the edges information (data, batch size, etc.).
Returns:: A dictionary containing the message multiplied by the edge weight.
Return type:: dict

update_function(nodes)[source]¶

Update function for updating node features based on the aggregated messages.

This function is used in the forward method to perform a forward pass on the graph.

Parameters:: nodes (dgl.NodeBatch) – A node batch object containing the nodes information (data, batch size, etc.).
Returns:: A dictionary containing the updated features.
Return type:: dict

class Net3D(hidden_dim, target_dim, readout_aggregators: List[str], node_wise_output_layers=2, batch_norm=True, batch_norm_momentum=0.1, reduce_func='sum', dropout=0.0, propagation_depth: int = 4, readout_layers: int = 2, readout_hidden_dim=None, fourier_encodings=4, update_net_layers=2, message_net_layers=2, use_node_features=False)[source]¶

Net3D is a 3D graph neural network that expects a DGL graph input with 3D coordinates stored under the name ‘d’ and node features stored under the name ‘feat’. It is based on the 3D Infomax architecture [1].

Parameters:

hidden_dim (int) – The dimension of the hidden layers.
target_dim (int) – The dimension of the output layer.
readout_aggregators (List[str]) – A list of aggregator functions for the readout layer. Options are ‘sum’, ‘max’, ‘min’, ‘mean’.
batch_norm (bool, optional (default=False)) – Whether to use batch normalization.
node_wise_output_layers (int, optional (default=2)) – The number of output layers for each node.
readout_batchnorm (bool, optional (default=True)) – Whether to use batch normalization in the readout layer.
batch_norm_momentum (float, optional (default=0.1)) – The momentum for the batch normalization layers.
reduce_func (str, optional (default='sum')) – The reduce function to use for aggregating messages.
dropout (float, optional (default=0.0)) – The dropout rate for the layers.
propagation_depth (int, optional (default=4)) – The number of propagation layers in the network.
readout_layers (int, optional (default=2)) – The number of readout layers in the network.
readout_hidden_dim (int, optional (default=None)) – The dimension of the hidden layers in the readout network.
fourier_encodings (int, optional (default=0)) – The number of Fourier encodings to use.
activation (str, optional (default='SiLU')) – The activation function to use in the network.
update_net_layers (int, optional (default=2)) – The number of update network layers.
message_net_layers (int, optional (default=2)) – The number of message network layers.
use_node_features (bool, optional (default=False)) – Whether to use node features as input.

Examples

>>> from deepchem.feat.molecule_featurizers.conformer_featurizer import RDKitConformerFeaturizer
>>> from deepchem.models.torch_models.gnn3d import Net3D
>>> smiles = ["C[C@H](F)Cl", "C[C@@H](F)Cl"]
>>> featurizer = RDKitConformerFeaturizer()
>>> data = featurizer.featurize(smiles)
>>> dgldata = [graph.to_dgl_graph() for graph in data]
>>> net3d = Net3D(hidden_dim=3, target_dim=2, readout_aggregators=['sum', 'mean'])
>>> output = [net3d(graph) for graph in dgldata]

References

__init__(hidden_dim, target_dim, readout_aggregators: List[str], node_wise_output_layers=2, batch_norm=True, batch_norm_momentum=0.1, reduce_func='sum', dropout=0.0, propagation_depth: int = 4, readout_layers: int = 2, readout_hidden_dim=None, fourier_encodings=4, update_net_layers=2, message_net_layers=2, use_node_features=False)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(graph: DGLGraph)[source]¶

Forward pass of the Net3D model.

Parameters:: graph (dgl.DGLGraph) – The input graph with node features stored under the key ‘x’ and edge distances stored under the key ‘d’.
Returns:: The graph representation tensor of shape (1, target_dim).
Return type:: torch.Tensor

output_node_func(nodes)[source]¶

Apply the node-wise output network to the node features.

Parameters:: nodes (dgl.NodeBatch) – A batch of nodes with features stored under the key ‘feat’.
Returns:: A dictionary with the updated node features under the key ‘feat’.
Return type:: dict

input_edge_func(edges)[source]¶

Apply the edge input network to the edge features.

Parameters:: edges (dgl.EdgeBatch) – A batch of edges with distances stored under the key ‘d’.
Returns:: A dictionary with the updated edge features under the key ‘d’.
Return type:: dict

class DTNNEmbedding(n_embedding: int = 30, periodic_table_length: int = 30, initalizer: str = 'xavier_uniform_', **kwargs)[source]¶

DTNNEmbedding layer for DTNN model.

Assign initial atomic descriptors. [1]_

This layer creates ‘n’ number of embeddings as initial atomic descriptors. According to the required weight initializer and periodic_table_length (Total number of unique atoms).

References

[1] Schütt, Kristof T., et al. “Quantum-chemical insights from deep: tensor neural networks.” Nature communications 8.1 (2017): 1-8.

Examples

>>> from deepchem.models.torch_models import layers
>>> import torch
>>> layer = layers.DTNNEmbedding(30, 30, 'xavier_uniform_')
>>> output = layer(torch.tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))
>>> output.shape
torch.Size([10, 30])

__init__(n_embedding: int = 30, periodic_table_length: int = 30, initalizer: str = 'xavier_uniform_', **kwargs)[source]¶

Parameters:

n_embedding (int, optional) – Number of features for each atom
periodic_table_length (int, optional) – Length of embedding, 83=Bi
initalizer (str, optional) – Weight initialization for filters. Options: {xavier_uniform_, xavier_normal_, kaiming_uniform_, kaiming_normal_, trunc_normal_}

forward(inputs: Tensor)[source]¶

Returns Embeddings according to indices.

Parameters:: inputs (torch.Tensor) – Indices of Atoms whose embeddings are requested.
Returns:: atom_embeddings – Embeddings of atoms accordings to indices.
Return type:: torch.Tensor

class DTNNStep(n_embedding: int = 30, n_distance: int = 100, n_hidden: int = 60, initializer: str = 'xavier_uniform_', activation='tanh', **kwargs)[source]¶

DTNNStep Layer for DTNN model.

Encodes the atom’s interaction with other atoms according to distance relationships. [1]_

This Layer implements the Eq (7) from DTNN Paper. Then sums them up to get the final output using Eq (6) from DTNN Paper.

Eq (7): V_ij = tanh[W_fc . ((W_cf . C_j + b_cf) * (W_df . d_ij + b_df))]

Eq (6): C_i = C_i + sum(V_ij)

Here : ‘.’=Matrix Multiplication , ‘*’=Multiplication

References

[1] Schütt, Kristof T., et al. “Quantum-chemical insights from deep: tensor neural networks.” Nature communications 8.1 (2017): 1-8.

Examples

>>> from deepchem.models.torch_models import layers
>>> import torch
>>> embedding_layer = layers.DTNNEmbedding(4, 4)
>>> emb = embedding_layer(torch.Tensor([0,1,2,3]).to(torch.int64))
>>> step_layer = layers.DTNNStep(4, 6, 8)
>>> output_torch = step_layer([
...     torch.Tensor(emb),
...     torch.Tensor([0, 1, 2, 3, 4, 5]).to(torch.float32),
...     torch.Tensor([1]).to(torch.int64),
...     torch.Tensor([[1]]).to(torch.int64)
... ])
>>> output_torch.shape
torch.Size([2, 4, 4])

__init__(n_embedding: int = 30, n_distance: int = 100, n_hidden: int = 60, initializer: str = 'xavier_uniform_', activation='tanh', **kwargs)[source]¶

Parameters:

n_embedding (int, optional) – Number of features for each atom
n_distance (int, optional) – granularity of distance matrix
n_hidden (int, optional) – Number of nodes in hidden layer
initializer (str, optional) – Weight initialization for filters. Options: {xavier_uniform_, xavier_normal_, kaiming_uniform_, kaiming_normal_, trunc_normal_}
activation (str, optional) – Activation function applied

forward(inputs)[source]¶

Executes the equations and Returns the intraction vector of the atom with other atoms.

Parameters:: inputs (torch.Tensor) – List of Tensors having atom_features, distance, distance_membership_i, distance_membership_j.
Returns:: interaction_vector – interaction of the atom with other atoms based on distance and distance_membership.
Return type:: torch.Tensor

class DTNNGather(n_embedding=30, n_outputs=100, layer_sizes=[100], output_activation=True, initializer='xavier_uniform_', activation='tanh', **kwargs)[source]¶

DTNNGather Layer for DTNN Model.

Predict Molecular Energy using atom_features and atom_membership. [1]_

This Layer gathers the inputs got from the step layer according to atom_membership and calulates the total Molecular Energy.

References

[1] Schütt, Kristof T., et al. “Quantum-chemical insights from deep: tensor neural networks.” Nature communications 8.1 (2017): 1-8.

Examples

>>> from deepchem.models.torch_models import layers as layers_torch
>>> import torch
>>> gather_layer_torch = layers_torch.DTNNGather(3, 3, [10])
>>> result = gather_layer_torch([torch.Tensor([[3, 2, 1]]).to(torch.float32), torch.Tensor([0]).to(torch.int64)])
>>> result.shape
torch.Size([1, 3])

__init__(n_embedding=30, n_outputs=100, layer_sizes=[100], output_activation=True, initializer='xavier_uniform_', activation='tanh', **kwargs)[source]¶

Parameters:

n_embedding (int, optional) – Number of features for each atom
n_outputs (int, optional) – Number of features for each molecule(output)
layer_sizes (list of int, optional(default=[100])) – Structure of hidden layer(s)
initializer (str, optional) – Weight initialization for filters.
activation (str, optional) – Activation function applied

forward(inputs)[source]¶

Executes the equation and Returns Molecular Energies according to atom_membership.

Parameters:: inputs (torch.Tensor) – List of Tensor containing atom_features and atom_membership
Returns:: molecular_energies – Tensor containing the Molecular Energies according to atom_membership.
Return type:: torch.Tensor

class GradientPenaltyLayer(gan: WGANModel, discriminator: Module, **kwargs)[source]¶

Implements the gradient penalty loss term for WGANs.

This class implements the gradient penalty loss term for WGANs as described in Gulrajani et al., “Improved Training of Wasserstein GANs” [wgan2]. It is used internally by WGANModel

Examples

Importing necessary modules

>>> import deepchem
>>> from deepchem.models.torch_models.gan import WGANModel
>>> from deepchem.models.torch_models import GradientPenaltyLayer
>>> import torch
>>> import torch.nn as nn
>>> import torch.nn.functional as F

Creating a Generator

>>> class Generator(nn.Module):
...     def __init__(self, noise_input_shape, conditional_input_shape):
...         super(Generator, self).__init__()
...         self.noise_input_shape = noise_input_shape
...         self.conditional_input_shape = conditional_input_shape
...         self.noise_dim = noise_input_shape[1:]
...         self.conditional_dim = conditional_input_shape[1:]
...         input_dim = sum(self.noise_dim) + sum(self.conditional_dim)
...         self.output = nn.Linear(input_dim, 1)
...     def forward(self, input):
...         noise_input, conditional_input = input
...         inputs = torch.cat((noise_input, conditional_input), dim=1)
...         output = self.output(inputs)
...         return output

Creating a Discriminator

>>> class Discriminator(nn.Module):
...     def __init__(self, data_input_shape, conditional_input_shape):
...         super(Discriminator, self).__init__()
...         self.data_input_shape = data_input_shape
...         self.conditional_input_shape = conditional_input_shape
...         # Extracting the actual data dimension
...         data_dim = data_input_shape[1:]
...         # Extracting the actual conditional dimension
...         conditional_dim = conditional_input_shape[1:]
...         input_dim = sum(data_dim) + sum(conditional_dim)
...         # Define the dense layers
...         self.dense1 = nn.Linear(input_dim, 10)
...         self.dense2 = nn.Linear(10, 1)
...     def forward(self, input):
...         data_input, conditional_input = input
...         # Concatenate data_input and conditional_input along the second dimension
...         discrim_in = torch.cat((data_input, conditional_input), dim=1)
...         # Pass the concatenated input through the dense layers
...         x = F.relu(self.dense1(discrim_in))
...         output = self.dense2(x)
...         return output

Creating an Example WGANModel class

>>> class ExampleWGAN(WGANModel):
...     def get_noise_input_shape(self):
...         return (100,2,)
...     def get_data_input_shapes(self):
...         return [(100,1,)]
...     def get_conditional_input_shapes(self):
...         return [(100,1,)]
...     def create_generator(self):
...         noise_dim = self.get_noise_input_shape()
...         conditional_dim = self.get_conditional_input_shapes()[0]
...         return nn.Sequential(Generator(noise_dim, conditional_dim))
...     def create_discriminator(self):
...         data_input_shape = self.get_data_input_shapes()[0]
...         conditional_input_shape = self.get_conditional_input_shapes()[0]
...         return nn.Sequential(
...             Discriminator(data_input_shape, conditional_input_shape))

Defining an Example GradientPenaltyLayer

>>> wgan = ExampleWGAN()
>>> discriminator = wgan.discriminators[0]
>>> gpl = GradientPenaltyLayer(wgan, discriminator)
>>> inputs = [torch.randn(4, 1)]
>>> conditional_inputs = [torch.randn(4, 1)]
>>> output, penalty = gpl(inputs, conditional_inputs)

References

[wgan2]

Gulrajani, Ishaan, et al. “Improved training of wasserstein gans.” Advances in neural information processing systems 30 (2017). (https://arxiv.org/abs/1704.00028)

__init__(gan: WGANModel, discriminator: Module, **kwargs) → None[source]¶

Construct a GradientPenaltyLayer.

Parameters:

gan (WGANModel) – the WGANModel that this layer is part of
discriminator (nn.Module) – the discriminator to compute the gradient penalty for

forward(inputs: list | Tensor, conditional_inputs: Tensor) → list[source]¶

Compute the output of the gradient penalty layer.

Parameters:

inputs (list of Tensor) – the inputs to the discriminator.
conditional_inputs (Tensor) – the conditional inputs to the discriminator.

Returns:

output – the output from the discriminator, followed by the gradient penalty.

Return type:

list [Tensor, Tensor]

class MolGANConvolutionLayer(units: int, nodes: int, activation=<built-in method tanh of type object>, dropout_rate: float = 0.0, edges: int = 5, name: str = '', prev_shape: int = 0, device: ~torch.device = device(type='cpu'))[source]¶

Graph convolution layer used in MolGAN model. MolGAN is a WGAN type model for generation of small molecules. Not used directly, higher level layers like MolGANMultiConvolutionLayer use it. This layer performs basic convolution on one-hot encoded matrices containing atom and bond information. This layer also accepts three inputs for the case when convolution is performed more than once and results of previous convolution need to used. It was done in such a way to avoid creating another layer that accepts three inputs rather than two. The last input layer is so-called hidden_layer and it hold results of the convolution while first two are unchanged input tensors.

Examples

See: MolGANMultiConvolutionLayer for using in layers.

>>> import torch
>>> import torch.nn as nn
>>> import torch.nn.functional as F
>>> vertices = 9
>>> nodes = 5
>>> edges = 5
>>> units = 128

>>> layer1 = MolGANConvolutionLayer(units=units, edges=edges, nodes=nodes, name='layer1')
>>> adjacency_tensor = torch.randn((1, vertices, vertices, edges))
>>> node_tensor = torch.randn((1, vertices, nodes))
>>> output = layer1([adjacency_tensor, node_tensor])

References

__init__(units: int, nodes: int, activation=<built-in method tanh of type object>, dropout_rate: float = 0.0, edges: int = 5, name: str = '', prev_shape: int = 0, device: ~torch.device = device(type='cpu'))[source]¶

Initialize this layer.

Parameters:

units (int) – Dimesion of dense layers used for convolution
nodes (int) – Number of features in node tensor
activation (function, optional (default=Tanh)) – activation function used across model, default is Tanh
dropout_rate (float, optional (default=0.0)) – Dropout rate used by dropout layer
edges (int, optional (default=5)) – How many dense layers to use in convolution. Typically equal to number of bond types used in the model.
name (string, optional (default="")) – Name of the layer
prev_shape (int, optional (default=0)) – Shape of the previous layer, used when more than two inputs are passed

forward(inputs: List) → Tuple[Tensor, Tensor, Tensor][source]¶

Invoke this layer

Parameters:: inputs (list) – List of two input matrices, adjacency tensor and node features tensors in one-hot encoding format.
Returns:: First and second are original input tensors Third is the result of convolution
Return type:: tuple(torch.Tensor,torch.Tensor,torch.Tensor)

class MolGANAggregationLayer(units: int = 128, activation=<built-in method tanh of type object>, dropout_rate: float = 0.0, name: str = '', prev_shape: int = 0, device: ~torch.device = device(type='cpu'))[source]¶

Graph Aggregation layer used in MolGAN model. MolGAN is a WGAN type model for generation of small molecules. Performs aggregation on tensor resulting from convolution layers. Given its simple nature it might be removed in future and moved to MolGANEncoderLayer.

Examples

>>> import torch
>>> import torch.nn as nn
>>> import torch.nn.functional as F
>>> vertices = 9
>>> nodes = 5
>>> edges = 5
>>> units = 128

>>> layer_1 = MolGANConvolutionLayer(units=units,nodes=nodes,edges=edges, name='layer1')
>>> layer_2 = MolGANAggregationLayer(units=128, name='layer2')
>>> adjacency_tensor = torch.randn((1, vertices, vertices, edges))
>>> node_tensor = torch.randn((1, vertices, nodes))
>>> hidden_1 = layer_1([adjacency_tensor, node_tensor])
>>> output = layer_2(hidden_1[2])

References

__init__(units: int = 128, activation=<built-in method tanh of type object>, dropout_rate: float = 0.0, name: str = '', prev_shape: int = 0, device: ~torch.device = device(type='cpu'))[source]¶

Initialize the layer

Parameters:

units (int, optional (default=128)) – Dimesion of dense layers used for aggregation
activation (function, optional (default=Tanh)) – activation function used across model, default is Tanh
dropout_rate (float, optional (default=0.0)) – Used by dropout layer
name (string, optional (default="")) – Name of the layer
prev_shape (int, optional (default=0)) – Shape of the input tensor

forward(inputs: Tensor) → Tensor[source]¶

Invoke this layer

Parameters:: inputs (List) – Single tensor resulting from graph convolution layer
Returns:: aggregation tensor – Result of aggregation function on input convolution tensor.
Return type:: torch.Tensor

class MolGANMultiConvolutionLayer(units: ~typing.Tuple = (128, 64), nodes: int = 5, activation=<built-in method tanh of type object>, dropout_rate: float = 0.0, edges: int = 5, name: str = '', device: ~torch.device = device(type='cpu'), **kwargs)[source]¶

Multiple pass convolution layer used in MolGAN model. MolGAN is a WGAN type model for generation of small molecules. It takes outputs of previous convolution layer and uses them as inputs for the next one. It simplifies the overall framework, but might be moved to MolGANEncoderLayer in the future in order to reduce number of layers.

Example

>>> import torch
>>> import torch.nn as nn
>>> import torch.nn.functional as F
>>> vertices = 9
>>> nodes = 5
>>> edges = 5
>>> units = (128,64)

>>> layer_1 = MolGANMultiConvolutionLayer(units=units, nodes=nodes, edges=edges, name='layer1')
>>> adjacency_tensor = torch.randn((1, vertices, vertices, edges))
>>> node_tensor = torch.randn((1, vertices, nodes))
>>> output = layer_1([adjacency_tensor, node_tensor])

References

__init__(units: ~typing.Tuple = (128, 64), nodes: int = 5, activation=<built-in method tanh of type object>, dropout_rate: float = 0.0, edges: int = 5, name: str = '', device: ~torch.device = device(type='cpu'), **kwargs)[source]¶

Initialize the layer

Parameters:

units (Tuple, optional (default=(128,64)), min_length=2) – ist of dimensions used by consecutive convolution layers. The more values the more convolution layers invoked.
nodes (int, optional (default=5)) – Number of features in node tensor
activation (function, optional (default=Tanh)) – activation function used across model, default is Tanh
dropout_rate (float, optional (default=0.0)) – Used by dropout layer
edges (int, optional (default=5)) – Controls how many dense layers use for single convolution unit. Typically matches number of bond types used in the molecule.
name (string, optional (default="")) – Name of the layer

forward(inputs: List) → Tensor[source]¶

Invoke this layer

Parameters:: inputs (list) – List of two input matrices, adjacency tensor and node features tensors in one-hot encoding format.
Returns:: convolution tensor – Result of input tensors going through convolution a number of times.
Return type:: torch.Tensor

class MolGANEncoderLayer(units: ~typing.List = [(128, 64), 128], activation: ~typing.Callable = <built-in method tanh of type object>, dropout_rate: float = 0.0, edges: int = 5, nodes: int = 5, name: str = '', device: ~torch.device = device(type='cpu'), **kwargs)[source]¶

Main learning layer used by MolGAN model. MolGAN is a WGAN type model for generation of small molecules. It role is to further simplify model. This layer can be manually built by stacking graph convolution layers followed by graph aggregation.

Example

>>> import torch
>>> import torch.nn as nn
>>> import torch.nn.functional as F
>>> vertices = 9
>>> nodes = 5
>>> edges = 5
>>> dropout_rate = 0.0
>>> adjacency_tensor = torch.randn((1, vertices, vertices, edges))
>>> node_tensor = torch.randn((1, vertices, nodes))

>>> graph = MolGANEncoderLayer(units = [(128,64),128], dropout_rate= dropout_rate, edges=edges, nodes=nodes)([adjacency_tensor,node_tensor])
>>> dense = nn.Linear(128,128)(graph)
>>> dense = torch.tanh(dense)
>>> dense = nn.Dropout(dropout_rate)(dense)
>>> dense = nn.Linear(128,64)(dense)
>>> dense = torch.tanh(dense)
>>> dense = nn.Dropout(dropout_rate)(dense)
>>> output = nn.Linear(64,1)(dense)

References

__init__(units: ~typing.List = [(128, 64), 128], activation: ~typing.Callable = <built-in method tanh of type object>, dropout_rate: float = 0.0, edges: int = 5, nodes: int = 5, name: str = '', device: ~torch.device = device(type='cpu'), **kwargs)[source]¶

Initialize the layer

Parameters:

units (List, optional (default=[(128,64),128])) – List of dimensions used by consecutive convolution layers. The more values the more convolution layers invoked.
activation (function, optional (default=Tanh)) – activation function used across model, default is Tanh
dropout_rate (float, optional (default=0.0)) – Used by dropout layer
edges (int, optional (default=5)) – Controls how many dense layers use for single convolution unit. Typically matches number of bond types used in the molecule.
nodes (int, optional (default=5)) – Number of features in node tensor
name (string, optional (default="")) – Name of the layer

forward(inputs: List) → Tensor[source]¶

Invoke this layer

Parameters:: inputs (list) – List of two input matrices, adjacency tensor and node features tensors in one-hot encoding format.
Returns:: encoder tensor – Tensor that been through number of convolutions followed by aggregation.
Return type:: tf.Tensor

class EdgeNetwork(n_pair_features: int = 8, n_hidden: int = 100, init: str = 'xavier_uniform_', **kwargs)[source]¶

The EdgeNetwork module is a PyTorch submodule designed for message passing in graph neural networks.

Examples

>>> pair_features = torch.rand((4, 2), dtype=torch.float32)
>>> atom_features = torch.rand((5, 2), dtype=torch.float32)
>>> atom_to_pair = []
>>> n_atoms = 2
>>> start = 0
>>> C0, C1 = np.meshgrid(np.arange(n_atoms), np.arange(n_atoms))
>>> atom_to_pair.append(np.transpose(np.array([C1.flatten() + start, C0.flatten() + start])))
>>> atom_to_pair = torch.Tensor(atom_to_pair)
>>> atom_to_pair = torch.squeeze(atom_to_pair.to(torch.int64), dim=0)
>>> inputs = [pair_features, atom_features, atom_to_pair]
>>> n_pair_features = 2
>>> n_hidden = 2
>>> init = 'xavier_uniform_'
>>> layer = EdgeNetwork(n_pair_features, n_hidden, init)
>>> result = layer(inputs)
>>> result.shape[1]
2

__init__(n_pair_features: int = 8, n_hidden: int = 100, init: str = 'xavier_uniform_', **kwargs)[source]¶

Initalises a EdgeNetwork Layer

Parameters:

n_pair_features (int, optional) – The length of the pair features vector.
n_hidden (int, optional) – number of hidden units in the passing phase
init (str, optional) – Initialization function to be used in the message passing layer.

forward(inputs: List[Tensor]) → Tensor[source]¶

Parameters:: inputs (List[torch.Tensor]) – The length of atom_to_pair should be same as n_pair_features.
Returns:: result – Tensor containing the mapping of the edge vector to a d × d matrix, where d denotes the dimension of the internal hidden representation of each node in the graph.
Return type:: torch.Tensor

class WeaveLayer(n_atom_input_feat: int = 75, n_pair_input_feat: int = 14, n_atom_output_feat: int = 50, n_pair_output_feat: int = 50, n_hidden_AA: int = 50, n_hidden_PA: int = 50, n_hidden_AP: int = 50, n_hidden_PP: int = 50, update_pair: bool = True, init_: str = 'xavier_uniform_', activation: str = 'relu', batch_normalize: bool = True, **kwargs)[source]¶

This class implements the core Weave convolution from the Google graph convolution paper [1]_ This is the Torch equivalent of the original implementation using Keras.

This model contains atom features and bond features separately.Here, bond features are also called pair features. There are 2 types of transformation, atom->atom, atom->pair, pair->atom, pair->pair that this model implements.

Examples

This layer expects 4 inputs in a list of the form [atom_features, pair_features, pair_split, atom_to_pair]. We’ll walk through the structure of these inputs. Let’s start with some basic definitions.

>>> import deepchem as dc
>>> import numpy as np

Suppose you have a batch of molecules

>>> smiles = ["CCC", "C"]

Note that there are 4 atoms in total in this system. This layer expects its input molecules to be batched together.

>>> total_n_atoms = 4

Let’s suppose that we have a featurizer that computes n_atom_feat features per atom.

>>> n_atom_feat = 75

Then conceptually, atom_feat is the array of shape (total_n_atoms, n_atom_feat) of atomic features. For simplicity, let’s just go with a random such matrix.

>>> atom_feat = np.random.rand(total_n_atoms, n_atom_feat)

Let’s suppose we have n_pair_feat pairwise features

>>> n_pair_feat = 14

For each molecule, we compute a matrix of shape (n_atoms*n_atoms,n_pair_feat) of pairwise features for each pair of atoms in the molecule. Let’s construct this conceptually for our example.

>>> pair_feat = [np.random.rand(3*3, n_pair_feat), np.random.rand(1*1,n_pair_feat)]
>>> pair_feat = np.concatenate(pair_feat, axis=0)
>>> pair_feat.shape
(10, 14)

pair_split is an index into pair_feat which tells us which atom each row belongs to. In our case, we hve

>>> pair_split = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3])

That is, the first 9 entries belong to “CCC” and the last entry to “C”. The final entry atom_to_pair goes in a little more in-depth than pair_split and tells us the precise pair each pair feature belongs to. In our case

>>> atom_to_pair = np.array([[0, 0],
...                          [0, 1],
...                          [0, 2],
...                          [1, 0],
...                          [1, 1],
...                          [1, 2],
...                          [2, 0],
...                          [2, 1],
...                          [2, 2],
...                          [3, 3]])

Let’s now define the actual layer

>>> layer = WeaveLayer()

And invoke it

>>> [A, P] = layer([atom_feat, pair_feat, pair_split, atom_to_pair])

The weave layer produces new atom/pair features. Let’s check their shapes

>>> A = A.detach().numpy()
>>> A.shape
(4, 50)
>>> P = P.detach().numpy()
>>> P.shape
(10, 50)

The 4 is total_num_atoms and the 10 is the total number of pairs. Where does 50 come from? It’s from the default arguments n_atom_input_feat and n_pair_input_feat.

References

__init__(n_atom_input_feat: int = 75, n_pair_input_feat: int = 14, n_atom_output_feat: int = 50, n_pair_output_feat: int = 50, n_hidden_AA: int = 50, n_hidden_PA: int = 50, n_hidden_AP: int = 50, n_hidden_PP: int = 50, update_pair: bool = True, init_: str = 'xavier_uniform_', activation: str = 'relu', batch_normalize: bool = True, **kwargs)[source]¶

Parameters:

n_atom_input_feat (int, optional (default 75)) – Number of features for each atom in input.
n_pair_input_feat (int, optional (default 14)) – Number of features for each pair of atoms in input.
n_atom_output_feat (int, optional (default 50)) – Number of features for each atom in output.
n_pair_output_feat (int, optional (default 50)) – Number of features for each pair of atoms in output.
n_hidden_AA (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
n_hidden_PA (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
n_hidden_AP (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
n_hidden_PP (int, optional (default 50)) – Number of units(convolution depths) in corresponding hidden layer
update_pair (bool, optional (default True)) – Whether to calculate for pair features, could be turned off for last layer
init (str, optional (default ‘xavier_uniform_’)) – Weight initialization for filters.
activation (str, optional (default 'relu')) – Activation function applied
batch_normalize (bool, optional (default True)) – If this is turned on, apply batch normalization before applying activation functions on convolutional layers.

forward(inputs: List[ndarray]) → List[Tensor][source]¶

Creates weave tensors.

Parameters:

inputs (List[Union[np.ndarray, np.ndarray, np.ndarray, np.ndarray]]) – Should contain 4 tensors [atom_features, pair_features, pair_split, atom_to_pair]

Returns:

A: Atom features tensor of shape (total_num_atoms,atom feature size)

P: Pair features tensor of shape (total num of pairs,bond feature size)

Return type:

List[Union[torch.Tensor, torch.Tensor]]

class WeaveGather(batch_size: int, n_input: int = 128, gaussian_expand: bool = True, compress_post_gaussian_expansion: bool = False, init_: str = 'xavier_uniform_', activation: str = 'tanh', **kwargs)[source]¶

Implements the weave-gathering section of weave convolutions. This is the Torch equivalent of the original implementation using Keras.

Implements the gathering layer from [1]_. The weave gathering layer gathers per-atom features to create a molecule-level fingerprint in a weave convolutional network. This layer can also performs Gaussian histogram expansion as detailed in [1]_. Note that the gathering function here is simply addition as in [1]_>

Examples

This layer expects 2 inputs in a list of the form [atom_features, pair_features]. We’ll walk through the structure of these inputs. Let’s start with some basic definitions.

>>> import deepchem as dc
>>> import numpy as np

Suppose you have a batch of molecules

>>> smiles = ["CCC", "C"]

Note that there are 4 atoms in total in this system. This layer expects its input molecules to be batched together.

>>> total_n_atoms = 4

Let’s suppose that we have n_atom_feat features per atom.

>>> n_atom_feat = 75

Then conceptually, atom_feat is the array of shape (total_n_atoms, n_atom_feat) of atomic features. For simplicity, let’s just go with a random such matrix.

>>> atom_feat = np.random.rand(total_n_atoms, n_atom_feat)

We then need to provide a mapping of indices to the atoms they belong to. In ours case this would be

>>> atom_split = np.array([0, 0, 0, 1])

Let’s now define the actual layer

>>> gather = WeaveGather(batch_size=2, n_input=n_atom_feat)
>>> output_molecules = gather([atom_feat, atom_split])
>>> len(output_molecules)
2

References

__init__(batch_size: int, n_input: int = 128, gaussian_expand: bool = True, compress_post_gaussian_expansion: bool = False, init_: str = 'xavier_uniform_', activation: str = 'tanh', **kwargs)[source]¶

Parameters:

batch_size (int) – number of molecules in a batch
n_input (int, optional (default 128)) – number of features for each input molecule
gaussian_expand (boolean, optional (default True)) – Whether to expand each dimension of atomic features by gaussian histogram
compress_post_gaussian_expansion (bool, optional (default False)) – If True, compress the results of the Gaussian expansion back to the original dimensions of the input by using a linear layer with specified activation function. Note that this compression was not in the original paper, but was present in the original DeepChem implementation so is left present for backwards compatibility.
init (str, optional (default ‘xavier_uniform_’)) – Weight initialization for filters if compress_post_gaussian_expansion is True.
activation (str, optional (default 'tanh')) – Activation function applied for filters if compress_post_gaussian_expansion is True.

forward(inputs: List[ndarray]) → Tensor[source]¶

Creates weave tensors.

Parameters:: inputs (List[Union[np.ndarray,np.ndarray]]) – Should contain 2 tensors [atom_features, atom_split]
Returns:: output_molecules – Each entry in this list is of shape (self.n_inputs,)
Return type:: torch.Tensor

gaussian_histogram(x: Tensor) → Tensor[source]¶

Expands input into a set of gaussian histogram bins.

Parameters:: x (torch.Tensor) – Of shape (N, n_feat)

Examples

This method uses 11 bins spanning portions of a Gaussian with zero mean and unit standard deviation.

>>> gaussian_memberships = [(-1.645, 0.283), (-1.080, 0.170),
...                         (-0.739, 0.134), (-0.468, 0.118),
...                         (-0.228, 0.114), (0., 0.114),
...                         (0.228, 0.114), (0.468, 0.118),
...                         (0.739, 0.134), (1.080, 0.170),
...                         (1.645, 0.283)]

We construct a Gaussian at gaussian_memberships[i][0] with standard deviation gaussian_memberships[i][1]. Each feature in x is assigned the probability of falling in each Gaussian, and probabilities are normalized across the 11 different Gaussians.

Returns:: outputs – Of shape (N, 11*n_feat)
Return type:: torch.Tensor

class MXMNetGlobalMessagePassing(dim: int, activation_fn: Callable | str = 'silu')[source]¶

This class implements the Global Message Passing Layer from the Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures(MXMNet) paper [1]_.

This layer consists of two message passing steps and an update step between them.

Let:

x_i : The node to be updated
h_i : The hidden state of x_i
x_j : The neighbour node connected to x_i by edge e_ij
h_j : The hidden state of x_j
W : The edge weights
m_ij : The message between x_i and x_j
h_j (self_loop) : The set of hidden states of atom features
mlp : MultilayerPerceptron
res : ResidualBlock

In each message passing step

m_ij = mlp1([h_i || h_j || e_ij])*(e_ij W)

To handle self loops

m_ij = m_ij + h_j(self_loop)

In each update step

hm_j = res1(sum(m_ij))
h_j_new = mlp2(hm_j) + h_j
h_j_new = res2(h_j_new)
h_j_new = res3(h_j_new)

Message passing and message aggregation(sum) is handled by propagate().

References

Examples

The provided example demonstrates how to use the GlobalMessagePassing layer by creating an instance, passing input tensors (node_features, edge_attributes, edge_indices) through it, and checking the shape of the output.

Initializes variables and creates a configuration dictionary with specific values.

>>> dim = 1
>>> node_features = torch.tensor([[0.8343], [1.2713], [1.2713], [1.2713], [1.2713]])
>>> edge_attributes = torch.tensor([[1.0004], [1.0004], [1.0005], [1.0004], [1.0004],[-0.2644], [-0.2644], [-0.2644], [1.0004],[-0.2644], [-0.2644], [-0.2644], [1.0005],[-0.2644], [-0.2644], [-0.2644], [1.0004],[-0.2644], [-0.2644], [-0.2644]])
>>> edge_indices = torch.tensor([[0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4],[1, 2, 3, 4, 0, 2, 3, 4, 0, 1, 3, 4, 0, 1, 2, 4, 0, 1, 2, 3]])
>>> out = MXMNetGlobalMessagePassing(dim)
>>> output = out(node_features, edge_attributes, edge_indices)
>>> output.shape
torch.Size([5, 1])

__init__(dim: int, activation_fn: Callable | str = 'silu')[source]¶

Initializes the MXMNETGlobalMessagePassing layer.

Parameters:: dim (int) – The dimension of the input and output features.

forward(node_features: Tensor, edge_attributes: Tensor, edge_indices: Tensor) → Tensor[source]¶

Performs the forward pass of the GlobalMessagePassing layer.

Parameters:

node_features (torch.Tensor) – The input node features tensor of shape (num_nodes, feature_dim).
edge_attributes (torch.Tensor) – The input edge attribute tensor of shape (num_edges, attribute_dim).
edge_indices (torch.Tensor) – The input edge index tensor of shape (2, num_edges).

Returns:

The updated node features tensor after message passing of shape (num_nodes, feature_dim).

Return type:

torch.Tensor

message(x_i: Tensor, x_j: Tensor, edge_attr: Tensor) → Tensor[source]¶

Constructs messages to be passed along the edges in the graph.

Parameters:

x_i (torch.Tensor) – The source node features tensor of shape (num_edges+num_nodes, feature_dim).
x_j (torch.Tensor) – The target node features tensor of shape (num_edges+num_nodes, feature_dim).
edge_attributes (torch.Tensor) – The edge attribute tensor of shape (num_edges, attribute_dim).

Returns:

The constructed messages tensor.

Return type:

torch.Tensor

class MXMNetBesselBasisLayer(num_radial: int, cutoff: float = 5.0, envelope_exponent: int = 5)[source]¶

This layer implements a basis layer for the MXMNet model using Bessel functions. The basis layer is used to model radial symmetry in molecular systems.

The output of the layer is given by: output = envelope(dist / cutoff) * (freq * dist / cutoff).sin()

Examples

>>> radial_layer = MXMNetBesselBasisLayer(num_radial=2, cutoff=2.0, envelope_exponent=2)
>>> distances = torch.tensor([0.5, 1.0, 2.0, 3.0])
>>> output = radial_layer(distances)
>>> output.shape
torch.Size([4, 2])

__init__(num_radial: int, cutoff: float = 5.0, envelope_exponent: int = 5)[source]¶

Initialize the MXMNet Bessel Basis Layer.

Parameters:

num_radial (int) – The number of radial basis functions to use.
cutoff (float, optional (default 5.0)) – The radial cutoff distance used to scale the distances.
envelope_exponent (int, optional (default 5)) – The exponent of the envelope function.

reset_parameters()[source]¶

Reset and initialize the learnable parameters of the MXMNet Bessel Basis Layer.

The ‘freq’ tensor, representing the frequencies of the Bessel functions, is set up with initial values proportional to π (PI) and becomes a learnable parameter.

The ‘freq’ tensor will be updated during the training process to optimize the performance of the MXMNet model for the specific task it is being trained on.

forward(dist: Tensor) → Tensor[source]¶

Compute the output of the MXMNet Bessel Basis Layer.

Parameters:: dist (torch.Tensor) – The input tensor representing the pairwise distances between atoms.
Returns:: output – The output tensor representing the radial basis functions applied to the input distances.
Return type:: torch.Tensor

class DTNN(n_tasks: int, n_embedding: int = 30, n_hidden: int = 100, n_distance: int = 100, distance_min: float = -1, distance_max: float = 18, output_activation: bool = True, mode: str = 'regression', dropout: float = 0.0, n_steps: int = 2)[source]¶

Deep Tensor Neural Networks

DTNN is based on the many-body Hamiltonian concept, which is a fundamental principle in quantum mechanics. The DTNN recieves a molecule’s distance matrix and membership of its atom from its Coulomb Matrix representation. Then, it iteratively refines the representation of each atom by considering its interactions with neighboring atoms. Finally, it predicts the energy of the molecule by summing up the energies of the individual atoms.

In this class, we establish a sequential model for the Deep Tensor Neural Network (DTNN) [1]_.

Examples

>>> import os
>>> import torch
>>> from deepchem.models.torch_models import DTNN
>>> from deepchem.data import SDFLoader
>>> from deepchem.feat import CoulombMatrix
>>> from deepchem.utils import batch_coulomb_matrix_features
>>> # Get Data
>>> model_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
>>> dataset_file = os.path.join(model_dir, 'tests/assets/qm9_mini.sdf')
>>> TASKS = ["alpha", "homo"]
>>> loader = SDFLoader(tasks=TASKS, featurizer=CoulombMatrix(29), sanitize=True)
>>> data = loader.create_dataset(dataset_file, shard_size=100)
>>> inputs = batch_coulomb_matrix_features(data.X)
>>> atom_number, distance, atom_membership, distance_membership_i, distance_membership_j = inputs
>>> inputs = [torch.tensor(atom_number).to(torch.int64),
...           torch.tensor(distance).to(torch.float32),
...           torch.tensor(atom_membership).to(torch.int64),
...           torch.tensor(distance_membership_i).to(torch.int64),
...           torch.tensor(distance_membership_j).to(torch.int64)]
>>> n_tasks = data.y.shape[0]
>>> model = DTNN(n_tasks)
>>> pred = model(inputs)

References

__init__(n_tasks: int, n_embedding: int = 30, n_hidden: int = 100, n_distance: int = 100, distance_min: float = -1, distance_max: float = 18, output_activation: bool = True, mode: str = 'regression', dropout: float = 0.0, n_steps: int = 2)[source]¶

Parameters:

n_tasks (int) – Number of tasks
n_embedding (int (default 30)) – Number of features per atom.
n_hidden (int (default 100)) – Number of features for each molecule after DTNNStep
n_distance (int (default 100)) – granularity of distance matrix step size will be (distance_max-distance_min)/n_distance
distance_min (float (default -1)) – minimum distance of atom pairs (in Angstrom)
distance_max (float (default 18)) – maximum distance of atom pairs (in Angstrom)
output_activation (bool (default True)) – determines whether an activation function should be apply to its output.
mode (str (default "regression")) – Only “regression” is currently supported.
dropout (float (default 0.0)) – the dropout probablity to use.
n_steps (int (default 2)) – Number of DTNNStep Layers to use.

forward(inputs: List[Tensor])[source]¶

Parameters:: inputs (List) – A list of tensors containing atom_number, distance, atom_membership, distance_membership_i, and distance_membership_j.
Returns:: output – Predictions of the Molecular Energy.
Return type:: torch.Tensor

class VariationalRandomizer(embedding_dimension: int, annealing_start_step: int, annealing_final_step: int, **kwargs)[source]¶

Add random noise to the embedding and include a corresponding loss.

This adds random noise to the encoder, and also adds a constraint term to the loss that forces the embedding vector to have a unit Gaussian distribution. We can then pick random vectors from a Gaussian distribution, and the output sequences should follow the same distribution as the training data.

We can use this layer with an AutoEncoder, which makes it a Variational AutoEncoder. The constraint term in the loss is initially set to 0, so the optimizer just tries to minimize the reconstruction loss. Once it has made reasonable progress toward that, the constraint term can be gradually turned back on. The range of steps over which this happens is configured by modifying the annealing_start_step and annealing final_step parameter.

Examples

>>> from deepchem.models.torch_models.layers import VariationalRandomizer
>>> import torch
>>> embedding_dimension = 512
>>> batch_size = 100
>>> annealing_start_step = 1000
>>> annealing_final_step = 2000
>>> embedding_shape = (batch_size, embedding_dimension)
>>> embeddings = torch.rand(embedding_shape)
>>> global_step = torch.tensor([100])
>>> layer = VariationalRandomizer(embedding_dimension, annealing_start_step, annealing_final_step)
>>> output = layer([embeddings, global_step])
>>> output.shape
torch.Size([100, 512])

References

__init__(embedding_dimension: int, annealing_start_step: int, annealing_final_step: int, **kwargs)[source]¶

Initialize the VariationalRandomizer layer.

Parameters:

embedding_dimension (int) – The dimension of the embedding.
annealing_start_step (int) – the step (that is, batch) at which to begin turning on the constraint term for KL cost annealing.
annealing_final_step (int) – the step (that is, batch) at which to finish turning on the constraint term for KL cost annealing.

forward(inputs: List[Tensor], training=True)[source]¶

Returns the Variationally Randomized Embedding.

Parameters:

inputs (List[torch.Tensor]) – A list of two tensors, the first of which is the input to the layer and the second of which is the global step.
training (bool, optional (default True)) – Whether to use the layer in training mode or inference mode.

Returns:

embedding – The embedding tensor.

Return type:

torch.Tensor

add_loss(loss)[source]¶

Add a loss term to the layer.

Parameters:: loss (torch.Tensor) – The loss tensor to add to the layer.

class EncoderRNN(input_size: int, hidden_size: int, n_layers: int, dropout_p: float = 0.1, **kwargs)[source]¶

Encoder Layer for SeqToSeq Model.

It takes input sequences and converts them into a fixed-size context vector called the “embedding”. This vector contains all relevant information from the input sequence. This context vector is then used by the decoder to generate the output sequence and can also be used as a representation of the input sequence for other Models.

Examples

>>> from deepchem.models.torch_models.layers import EncoderRNN
>>> import torch
>>> embedding_dimensions = 7
>>> num_input_token = 4
>>> n_layers = 9
>>> input = torch.tensor([[1, 0, 2, 3, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]])
>>> layer = EncoderRNN(num_input_token, embedding_dimensions, n_layers)
>>> emb, hidden = layer(input)
>>> emb.shape
torch.Size([3, 5, 7])

References

__init__(input_size: int, hidden_size: int, n_layers: int, dropout_p: float = 0.1, **kwargs)[source]¶

Initialize the EncoderRNN layer.

Parameters:

input_size (int) – The number of expected features.
hidden_size (int) – The number of features in the hidden state.
dropout_p (float (default 0.1)) – The dropout probability to use during training.

forward(input: Tensor)[source]¶

Returns Embeddings according to provided sequences.

Parameters:

input (torch.Tensor) – Batch of input sequences.

Returns:

output (torch.Tensor) – Batch of Embeddings.
hidden (torch.Tensor) – Batch of hidden states.

class DecoderRNN(hidden_size: int, output_size: int, n_layers: int, max_length: int, batch_size: int, step_activation: str = 'relu', **kwargs)[source]¶

Decoder Layer for SeqToSeq Model.

The decoder transforms the embedding vector into the output sequence. It is trained to predict the next token in the sequence given the previous tokens in the sequence. It uses the context vector from the encoder to help generate the correct token in the sequence.

Examples

>>> from deepchem.models.torch_models.layers import DecoderRNN
>>> import torch
>>> embedding_dimensions = 512
>>> num_output_tokens = 7
>>> max_length = 10
>>> batch_size = 100
>>> n_layers = 2
>>> layer = DecoderRNN(embedding_dimensions, num_output_tokens, n_layers, max_length, batch_size)
>>> embeddings = torch.randn(batch_size, embedding_dimensions)
>>> output, hidden = layer([embeddings, None])
>>> output.shape
torch.Size([100, 10, 7])

References

__init__(hidden_size: int, output_size: int, n_layers: int, max_length: int, batch_size: int, step_activation: str = 'relu', **kwargs)[source]¶

Initialize the DecoderRNN layer.

Parameters:

hidden_size (int) – Number of features in the hidden state.
output_size (int) – Number of expected features.
max_length (int) – Maximum length of the sequence.
batch_size (int) – Batch size of the input.
step_activation (str (default "relu")) – Activation function to use after every step.

forward(inputs: List[Tensor])[source]¶

Parameters:

inputs (List[torch.Tensor]) – A list of tensor containg encoder_hidden and target_tensor.

Returns:

decoder_outputs (torch.Tensor) – Predicted output sequences.
decoder_hidden (torch.Tensor) – Hidden state of the decoder.

class SeqToSeq(n_input_tokens: int, n_output_tokens: int, max_output_length: int, encoder_layers: int = 4, decoder_layers: int = 4, batch_size: int = 100, embedding_dimension: int = 512, dropout: float = 0.0, variational: bool = False, annealing_start_step: int = 5000, annealing_final_step: int = 10000)[source]¶

Implements sequence to sequence translation models.

The model is based on the description in Sutskever et al., “Sequence to Sequence Learning with Neural Networks” (https://arxiv.org/abs/1409.3215), although this implementation uses GRUs instead of LSTMs. The goal is to take sequences of tokens as input, and translate each one into a different output sequence. The input and output sequences can both be of variable length, and an output sequence need not have the same length as the input sequence it was generated from. For example, these models were originally developed for use in natural language processing. In that context, the input might be a sequence of English words, and the output might be a sequence of French words. The goal would be to train the model to translate sentences from English to French.

The model consists of two parts called the “encoder” and “decoder”. Each one consists of a stack of recurrent layers. The job of the encoder is to transform the input sequence into a single, fixed length vector called the “embedding”. That vector contains all relevant information from the input sequence. The decoder then transforms the embedding vector into the output sequence.

These models can be used for various purposes. First and most obviously, they can be used for sequence to sequence translation. In any case where you have sequences of tokens, and you want to translate each one into a different sequence, a SeqToSeq model can be trained to perform the translation.

Another possible use case is transforming variable length sequences into fixed length vectors. Many types of models require their inputs to have a fixed shape, which makes it difficult to use them with variable sized inputs (for example, when the input is a molecule, and different molecules have different numbers of atoms). In that case, you can train a SeqToSeq model as an autoencoder, so that it tries to make the output sequence identical to the input one. That forces the embedding vector to contain all information from the original sequence. You can then use the encoder for transforming sequences into fixed length embedding vectors, suitable to use as inputs to other types of models.

Another use case is to train the decoder for use as a generative model. Here again you begin by training the SeqToSeq model as an autoencoder. Once training is complete, you can supply arbitrary embedding vectors, and transform each one into an output sequence. When used in this way, you typically train it as a variational autoencoder. This adds random noise to the encoder, and also adds a constraint term to the loss that forces the embedding vector to have a unit Gaussian distribution. You can then pick random vectors from a Gaussian distribution, and the output sequences should follow the same distribution as the training data.

When training as a variational autoencoder, it is best to use KL cost annealing, as described in https://arxiv.org/abs/1511.06349. The constraint term in the loss is initially set to 0, so the optimizer just tries to minimize the reconstruction loss. Once it has made reasonable progress toward that, the constraint term can be gradually turned back on. The range of steps over which this happens is configurable.

In this class, we establish a sequential model for the Sequence to Sequence (SeqToSeq) [1]_.

Examples

>>> import torch
>>> from deepchem.models.torch_models.seqtoseq import SeqToSeq
>>> from deepchem.utils.batch_utils import create_input_array
>>> # Dataset of SMILES strings for testing SeqToSeq models.
>>> train_smiles = [
...     'Cc1cccc(N2CCN(C(=O)C34CC5CC(CC(C5)C3)C4)CC2)c1C',
...     'Cn1ccnc1SCC(=O)Nc1ccc(Oc2ccccc2)cc1',
...     'COc1cc2c(cc1NC(=O)CN1C(=O)NC3(CCc4ccccc43)C1=O)oc1ccccc12',
...     'CCCc1cc(=O)nc(SCC(=O)N(CC(C)C)C2CCS(=O)(=O)C2)[nH]1',
... ]
>>> tokens = set()
>>> for s in train_smiles:
...     tokens = tokens.union(set(c for c in s))
>>> token_list = sorted(list(tokens))
>>> batch_size = len(train_smiles)
>>> MAX_LENGTH = max(len(s) for s in train_smiles)
>>> token_list = token_list + [" "]
>>> input_dict = dict((x, i) for i, x in enumerate(token_list))
>>> n_tokens = len(token_list)
>>> embedding_dimension = 16
>>> model = SeqToSeq(n_tokens, n_tokens, MAX_LENGTH, batch_size=batch_size,
...                  embedding_dimension=embedding_dimension)
>>> inputs = create_input_array(train_smiles, MAX_LENGTH, False, batch_size,
...                             input_dict, " ")
>>> output, embeddings = model([torch.tensor(inputs), torch.tensor([1])])
>>> output.shape
torch.Size([4, 57, 19])
>>> embeddings.shape
torch.Size([4, 16])

References

__init__(n_input_tokens: int, n_output_tokens: int, max_output_length: int, encoder_layers: int = 4, decoder_layers: int = 4, batch_size: int = 100, embedding_dimension: int = 512, dropout: float = 0.0, variational: bool = False, annealing_start_step: int = 5000, annealing_final_step: int = 10000)[source]¶

Initialize SeqToSeq model.

Parameters:

n_input_tokens (int) – Number of input tokens.
n_output_tokens (int) – Number of output tokens.
max_output_length (int) – Maximum length of output sequence.
encoder_layers (int (default 4)) – Number of recurrent layers in the encoder
decoder_layers (int (default 4)) – Number of recurrent layers in the decoder
embedding_dimension (int (default 512)) – Width of the embedding vector. This also is the width of all recurrent layers.
dropout (float (default 0.0)) – Dropout probability to use during training.
variational (bool (default False)) – If True, train the model as a variational autoencoder. This adds random noise to the encoder, and also constrains the embedding to follow a unit Gaussian distribution.
annealing_start_step (int (default 5000)) – the step (that is, batch) at which to begin turning on the constraint term for KL cost annealing.
annealing_final_step (int (default 10000)) – the tep (that is, batch) at which to finish turning on the constraint term for KL cost annealing.

forward(inputs: List)[source]¶

Generates Embeddings using Encoder then passes it to Decoder to predict output sequences.

Parameters:

inputs (List) – List of two tensors. First tensor is batch of input sequence. Second tensor is the current global_step.

Returns:

output (torch.Tensor) – Predicted output sequence.
_embedding (torch.Tensor) – Embeddings generated by the Encoder.

class FerminetElectronFeature(n_one: List[int], n_two: List[int], no_of_atoms: int, batch_size: int, total_electron: int, spin: List[int])[source]¶

A Pytorch Module implementing the ferminet’s electron features interaction layer _[1]. This is a helper class for the Ferminet model.

The layer consists of 2 types of linear layers - v for the one elctron features and w for the two electron features. The number and dimensions of each layer depends on the number of atoms and electrons in the molecule system.

References

Examples

>>> import deepchem as dc
>>> electron_layer = dc.models.torch_models.layers.FerminetElectronFeature([32,32,32],[16,16,16], 4, 8, 10, [5,5])
>>> one_electron_test = torch.randn(8, 10, 4*4)
>>> two_electron_test = torch.randn(8, 10, 10, 4)
>>> one, two = electron_layer.forward(one_electron_test, two_electron_test)
>>> one.size()
torch.Size([8, 10, 32])
>>> two.size()
torch.Size([8, 10, 10, 16])

__init__(n_one: List[int], n_two: List[int], no_of_atoms: int, batch_size: int, total_electron: int, spin: List[int])[source]¶

Parameters:

n_one (List[int]) – List of integer values containing the dimensions of each n_one layer’s output
n_two (List[int]) – List of integer values containing the dimensions of each n_one layer’s output
no_of_atoms (int:) – Value containing the number of atoms in the molecule system
batch_size (int) – Value containing the number of batches for the input provided
total_electron (int) – Value containing the total number of electrons in the molecule system
spin (List[int]) – List data structure in the format of [number of up-spin electrons, number of down-spin electrons]
v (torch.nn.ModuleList) – torch ModuleList containing the linear layer with the n_one layer’s dimension size.
w (torch.nn.ModuleList) – torch ModuleList containing the linear layer with the n_two layer’s dimension size.
layer_size (int) – Value containing the number of n_one and n_two layers

forward(one_electron: Tensor, two_electron: Tensor)[source]¶

Parameters:

one_electron (torch.Tensor) – The one electron feature which has the shape (batch_size, number of electrons, number of atoms * 4). Here the last dimension contains the electron’s distance from each of the atom as a vector concatenated with norm of that vector.
two_electron (torch.Tensor) – The two electron feature which has the shape (batch_size, number of electrons, number of electron , 4). Here the last dimension contains the electron’s distance from the other electrons as a vector concatenated with norm of that vector.

Returns:

one_electron (torch.Tensor) – The one electron feature after passing through the layer which has the shape (batch_size, number of electrons, n_one shape).
two_electron (torch.Tensor) – The two electron feature after passing through the layer which has the shape (batch_size, number of electrons, number of electron , n_two shape). The two electron feature after passing through the layer which has the shape (batch_size, number of electrons, number of electron , n_two shape).

class FerminetEnvelope(n_one: List[int], n_two: List[int], total_electron: int, batch_size: int, spin: List[int], no_of_atoms: int, determinant: int)[source]¶

A Pytorch Module implementing the ferminet’s envlope layer _[1], which is used to calculate the spin up and spin down orbital values. This is a helper class for the Ferminet model. The layer consists of 4 types of parameter lists - envelope_w, envelope_g, sigma and pi, which helps to calculate the orbital vlaues.

References

Examples

>>> import deepchem as dc
>>> import torch
>>> envelope_layer = dc.models.torch_models.layers.FerminetEnvelope([32, 32, 32], [16, 16, 16], 10, 8, [5, 5], 5, 16)
>>> one_electron = torch.randn(8, 10, 32)
>>> one_electron_permuted = torch.randn(8, 10, 5, 3)
>>> psi, psi_up, psi_down = envelope_layer.forward(one_electron, one_electron_permuted)
>>> psi.size()
torch.Size([8])
>>> psi_up.size()
torch.Size([8, 16, 5, 5])
>>> psi_down.size()
torch.Size([8, 16, 5, 5])

__init__(n_one: List[int], n_two: List[int], total_electron: int, batch_size: int, spin: List[int], no_of_atoms: int, determinant: int)[source]¶

Parameters:

n_one (List[int]) – List of integer values containing the dimensions of each n_one layer’s output
n_two (List[int]) – List of integer values containing the dimensions of each n_one layer’s output
total_electron (int) – Value containing the total number of electrons in the molecule system
batch_size (int) – Value containing the number of batches for the input provided
spin (List[int]) – List data structure in the format of [number of up-spin electrons, number of down-spin electrons]
no_of_atoms (int) – Value containing the number of atoms in the molecule system
determinant (int) – The number of determinants to be incorporated in the post-HF solution.
envelope_w (torch.nn.ParameterList) – torch ParameterList containing the torch Tensor with n_one layer’s dimension size.
envelope_g (torch.nn.ParameterList) – torch ParameterList containing the torch Tensor with the unit dimension size, which acts as bias.
sigma (torch.nn.ParameterList) – torch ParameterList containing the torch Tensor with the unit dimension size.
pi (torch.nn.ParameterList) – torch ParameterList containing the linear layer with the n_two layer’s dimension size.
layer_size (int) – Value containing the number of n_one and n_two layers

forward(one_electron: Tensor, one_electron_vector_permuted: Tensor)[source]¶

Parameters:

one_electron (torch.Tensor) – Torch tensor which is output from FerminElectronFeature layer in the shape of (batch_size, number of elctrons, n_one layer size).
one_electron_vector_permuted (torch.Tensor) – Torch tensor which is shape permuted vector of the original one_electron vector tensor. shape of the tensor should be (batch_size, number of atoms, number of electrons, 3).

Returns:

psi_up – Torch tensor with a scalar value containing the sampled wavefunction value for each batch.

Return type:

torch.Tensor

class MXMNetLocalMessagePassing(dim: int, activation_fn: Callable | str = 'silu')[source]¶

The MXMNetLocalMessagePassing class defines a local message passing layer used in the MXMNet model [1]_. This layer integrates cross-layer mappings inside the local message passing, allowing for the transformation of input tensors representing pairwise distances and angles between atoms in a molecular system. The layer aggregates information using message passing and updates atom representations accordingly. The 3-step message passing scheme is proposed in the paper [1]_.

Step 1 contains Message Passing 1 that captures the two-hop angles and related pairwise distances to update edge-level embeddings {mji}.
Step 2 contains Message Passing 2 that captures the one-hop angles and related pairwise distances to further update {mji}.
Step 3 finally aggregates {mji} to update the node-level embedding hi.

These steps in the t-th iteration can be formulated as follows:

Let:

mlp : MultilayerPerceptron
res : ResidualBlock
h : node_features
m : message with radial basis function
idx_kj: Tensor containing indices for the k and j atoms
x_i : The node to be updated
h_i : The hidden state of x_i
x_j : The neighbour node connected to x_i by edge e_ij
h_j : The hidden state of x_j
rbf : Input tensor representing radial basis functions
sbf : Input tensor representing the spherical basis functions
idx_jj : Tensor containing indices for the j and j' where j' is other neighbours of i

Step 1: Message Passing 1

m = [h[i] || h[j] || rbf]
m_kj = mlp_kj(m[idx_kj]) * (rbf*W) * mlp_sbf1(sbf1)
m_ji = mlp_ji_1(m) + reduce_sum(m_kj)

Step 2: Message Passing 2

m_ji = mlp_jj(m_ji[idx_jj]) * (rbf*W) * mlp_sbf2(sbf2)
m_ji = mlp_ji_2(m_ji) + reduce_sum(m_ji)

Step 3: Aggregation and Update

In each aggregation step

m = reduce_sum(m_ji*(rbf*W))

In each update step

hm_i = res1(m)
h_i_new = mlp2(hm_i) + h_i
h_i_new = res2(h_i_new)
h_i_new = res3(h_i_new)

References

Examples

>>> dim = 1
>>> h = torch.tensor([[0.8343], [1.2713], [1.2713], [1.2713], [1.2713]])
>>> rbf = torch.tensor([[-0.2628], [-0.2628], [-0.2628], [-0.2628],
...                     [-0.2629], [-0.2629], [-0.2628], [-0.2628]])
>>> sbf1 = torch.tensor([[-0.2767], [-0.2767], [-0.2767], [-0.2767],
...                      [-0.2767], [-0.2767], [-0.2767], [-0.2767],
...                      [-0.2767], [-0.2767], [-0.2767], [-0.2767]])
>>> sbf2 = torch.tensor([[-0.0301], [-0.0301], [-0.1483], [-0.1486], [-0.1484],
...                      [-0.0301], [-0.1483], [-0.0301], [-0.1485], [-0.1483],
...                      [-0.0301], [-0.1486], [-0.1485], [-0.0301], [-0.1486],
...                      [-0.0301], [-0.1484], [-0.1483], [-0.1486], [-0.0301]])
>>> idx_kj = torch.tensor([3, 5, 7, 1, 5, 7, 1, 3, 7, 1, 3, 5])
>>> idx_ji_1 = torch.tensor([0, 0, 0, 2, 2, 2, 4, 4, 4, 6, 6, 6])
>>> idx_jj = torch.tensor([0, 1, 3, 5, 7, 2, 1, 3, 5, 7, 4, 1, 3, 5, 7, 6, 1, 3, 5, 7])
>>> idx_ji_2 = torch.tensor([0, 1, 1, 1, 1, 2, 3, 3, 3, 3, 4, 5, 5, 5, 5, 6, 7, 7, 7, 7])
>>> edge_index = torch.tensor([[0, 1, 0, 2, 0, 3, 0, 4],
...                           [1, 0, 2, 0, 3, 0, 4, 0]])
>>> out = MXMNetLocalMessagePassing(dim, activation_fn='silu')
>>> output = out(h,
...             rbf,
...             sbf1,
...             sbf2,
...             idx_kj,
...             idx_ji_1,
...             idx_jj,
...             idx_ji_2,
...             edge_index)
>>> output[0].shape
torch.Size([5, 1])
>>> output[1].shape
torch.Size([5, 1])

__init__(dim: int, activation_fn: Callable | str = 'silu')[source]¶

Initializes the MXMNetLocalMessagePassing layer.

Parameters:

dim (int) – The dimension of the input and output tensors for the local message passing layer.
activation_fn (Union[Callable, str], optional (default: 'silu')) – The activation function to be used in the multilayer perceptrons (MLPs) within the layer.

forward(node_features: Tensor, rbf: Tensor, sbf1: Tensor, sbf2: Tensor, idx_kj: Tensor, idx_ji_1: Tensor, idx_jj: Tensor, idx_ji_2: Tensor, edge_index: Tensor) → Tuple[Tensor, Tensor][source]¶

The forward method performs the computation for the MXMNetLocalMessagePassing Layer. This method processes the input tensors representing atom features, radial basis functions (RBF), and spherical basis functions (SBF) using message passing over the molecular graph. The message passing updates the atom representations, and the resulting tensor represents the updated atom feature after local message passing.

Parameters:

node_features (torch.Tensor) – Input tensor representing atom features.
rbf (torch.Tensor) – Input tensor representing radial basis functions.
sbf1 (torch.Tensor) – Input tensor representing the first set of spherical basis functions.
sbf2 (torch.Tensor) – Input tensor representing the second set of spherical basis functions.
idx_kj (torch.Tensor) – Tensor containing indices for the k and j atoms involved in each interaction.
idx_ji_1 (torch.Tensor) – Tensor containing indices for the j and i atoms involved in the first message passing step.
idx_jj (torch.Tensor) – Tensor containing indices for the j and j’ atoms involved in the second message passing step.
idx_ji_2 (torch.Tensor) – Tensor containing indices for the j and i atoms involved in the second message passing step.
edge_index (torch.Tensor) – Tensor containing the edge indices of the molecular graph, with shape (2, M), where M is the number of edges.

Returns:

node_features (torch.Tensor) – Updated atom representations after local message passing.
output (torch.Tensor) – Output tensor representing a fixed-size representation, with shape (N, 1).

class MXMNetSphericalBasisLayer(num_spherical: int, num_radial: int, cutoff: float = 5.0, envelope_exponent: int = 5)[source]¶

It takes pairwise distances and angles between atoms as input and combines radial basis functions with spherical harmonic functions to generate a fixed-size representation that captures both radial and orientation information. This type of representation is commonly used in molecular modeling and simulations to capture the behavior of atoms and molecules in chemical systems.

Inside the initialization, Bessel basis functions and real spherical harmonic functions are generated. The Bessel basis functions capture the radial information, and the spherical harmonic functions capture the orientation information. These functions are generated based on the provided num_spherical and num_radial parameters.

Examples

>>> dist = torch.tensor([0.5, 1.0, 2.0, 3.0])
>>> angle = torch.tensor([0.1, 0.2, 0.3, 0.4])
>>> idx_kj = torch.tensor([0, 1, 2, 3])
>>> spherical_layer = MXMNetSphericalBasisLayer(envelope_exponent=2, num_spherical=2, num_radial=2, cutoff=2.0)
>>> output = spherical_layer(dist, angle, idx_kj)
>>> output.shape
torch.Size([4, 4])

__init__(num_spherical: int, num_radial: int, cutoff: float = 5.0, envelope_exponent: int = 5)[source]¶

Initialize the MXMNetSphericalBasisLayer.

Parameters:

num_spherical (int) – The number of spherical harmonic functions to use. These functions capture orientation information related to atom positions.
num_radial (int) – The number of radial basis functions to use. These functions capture information about pairwise distances between atoms.
cutoff (float, optional (default 5.0)) – The cutoff distance for the radial basis functions. It specifies the distance beyond which the interactions are ignored.
envelope_exponent (int, optional (default 5)) – The exponent for the envelope function. It controls the degree of damping for the radial basis functions.

forward(dist: Tensor, angle: Tensor, idx_kj: Tensor) → Tensor[source]¶

Forward pass of the MXMNetSphericalBasisLayer.

Parameters:

dist (torch.Tensor) – Input tensor representing pairwise distances between atoms.
angle (torch.Tensor) – Input tensor representing pairwise angles between atoms.
idx_kj (torch.Tensor) – Tensor containing indices for the k and j atoms.

Returns:

output – The output tensor containing the fixed-size representation.

Return type:

torch.Tensor

class HighwayLayer(d_input: int, activation_fn: Callable | str = 'relu')[source]¶

Highway layer from “Training Very Deep Networks” [1]

y = H(x) * T(x) + x * C(x), where

H(x): 1-layer neural network with non-linear activation T(x): 1-layer neural network with sigmoid activation C(X): 1 - T(X); As per the original paper

The output will be of the same dimension as the input

References

Examples

>>> x = torch.randn(16, 20)
>>> highway_layer = HighwayLayer(d_input=x.shape[1])
>>> y = highway_layer(x)
>>> x.shape
torch.Size([16, 20])
>>> y.shape
torch.Size([16, 20])

__init__(d_input: int, activation_fn: Callable | str = 'relu')[source]¶

Initializes the HighwayLayer.

Parameters:

d_input (int) – the dimension of the input layer
activation_fn (str) – the activation function to use for H(x)

forward(x: Tensor) → Tensor[source]¶

Forward pass of the HighwayLayer.

Parameters:: x (torch.Tensor) – Input tensor of dimension (,input_dim).
Returns:: output – Output tensor of dimension (,input_dim)
Return type:: torch.Tensor

class GraphConv(out_channel: int, number_input_features: int, min_deg: int = 0, max_deg: int = 10, activation_fn: Callable | None = None, **kwargs)[source]¶

Graph Convolutional Layers

This layer implements the graph convolution introduced in [1]_. The graph convolution combines per-node feature vectures in a nonlinear fashion with the feature vectors for neighboring nodes. This “blends” information in local neighborhoods of a graph.

Example

>>> import deepchem as dc
>>> import numpy as np
>>> import deepchem.models.torch_models.layers as torch_layers
>>> out_channels = 2
>>> n_atoms = 4  # In CCC and C, there are 4 atoms
>>> raw_smiles = ['CCC', 'C']
>>> from rdkit import Chem
>>> mols = [Chem.MolFromSmiles(s) for s in raw_smiles]
>>> featurizer = dc.feat.graph_features.ConvMolFeaturizer()
>>> mols = featurizer.featurize(mols)
>>> multi_mol = dc.feat.mol_graphs.ConvMol.agglomerate_mols(mols)
>>> atom_features = torch.from_numpy(multi_mol.get_atom_features().astype(np.float32))
>>> degree_slice = torch.from_numpy(multi_mol.deg_slice)
>>> membership = torch.from_numpy(multi_mol.membership)
>>> deg_adjs = [torch.from_numpy(i) for i in multi_mol.get_deg_adjacency_lists()[1:]]
>>> args = [atom_features, degree_slice, membership] + deg_adjs
>>> layer = torch_layers.GraphConv(out_channels, number_input_features=atom_features.shape[-1])
>>> result = layer(args)
>>> type(result)
<class 'torch.Tensor'>
>>> result.shape
torch.Size([4, 2])
>>> num_deg = 2 * layer.max_degree + (1 - layer.min_degree)
>>> num_deg
21

References

__init__(out_channel: int, number_input_features: int, min_deg: int = 0, max_deg: int = 10, activation_fn: Callable | None = None, **kwargs)[source]¶

Initialize a graph convolutional layer.

Parameters:

out_channel (int) – The number of output channels per graph node.
number_input_features (int) – The number of input features.
min_deg (int, optional (default 0)) – The minimum allowed degree for each graph node.
max_deg (int, optional (default 10)) – The maximum allowed degree for each graph node. Note that this is set to 10 to handle complex molecules (some organometallic compounds have strange structures). If you’re using this for non-molecular applications, you may need to set this much higher depending on your dataset.
activation_fn (function) – A nonlinear activation function to apply. If you’re not sure, torch.nn.ReLU is probably a good default for your application.

forward(inputs: List[Tensor]) → Tensor[source]¶: The forward pass combines per-node feature vectors in a nonlinear fashion with the feature vectors for neighboring nodes. :param inputs: :type inputs: List[torch.Tensor] :param Should contain atom features and arrays describing graph topology: :param Returns: :param ——-: :param torch.Tensor: Combined atom features

sum_neigh(atoms: Tensor, deg_adj_lists) → List[ndarray][source]¶: Store the summed atoms by degree

class GraphPool(min_degree: int = 0, max_degree: int = 10, **kwargs)[source]¶

A GraphPool gathers data from local neighborhoods of a graph.

This layer does a max-pooling over the feature vectors of atoms in a neighborhood. You can think of this layer as analogous to a max-pooling layer for 2D convolutions but which operates on graphs instead. This technique is described in [1]_.

Example

>>> import deepchem as dc
>>> import numpy as np
>>> import deepchem.models.torch_models.layers as torch_layers
>>> n_atoms = 4  # In CCC and C, there are 4 atoms
>>> raw_smiles = ['CCC', 'C']
>>> from rdkit import Chem
>>> mols = [Chem.MolFromSmiles(s) for s in raw_smiles]
>>> featurizer = dc.feat.graph_features.ConvMolFeaturizer()
>>> mols = featurizer.featurize(mols)
>>> multi_mol = dc.feat.mol_graphs.ConvMol.agglomerate_mols(mols)
>>> atom_features = torch.from_numpy(multi_mol.get_atom_features().astype(np.float32))
>>> degree_slice = torch.from_numpy(multi_mol.deg_slice)
>>> membership = torch.from_numpy(multi_mol.membership)
>>> deg_adjs = [torch.from_numpy(i) for i in multi_mol.get_deg_adjacency_lists()[1:]]
>>> args = [atom_features, degree_slice, membership] + deg_adjs
>>> result = torch_layers.GraphPool()(args)
>>> type(result)
<class 'torch.Tensor'>
>>> result.shape
torch.Size([4, 75])

References

__init__(min_degree: int = 0, max_degree: int = 10, **kwargs)[source]¶

Initialize this layer

Parameters:

min_deg (int, optional (default 0)) – The minimum allowed degree for each graph node.
max_deg (int, optional (default 10)) – The maximum allowed degree for each graph node. Note that this is set to 10 to handle complex molecules (some organometallic compounds have strange structures). If you’re using this for non-molecular applications, you may need to set this much higher depending on your dataset.

get_config() → str[source]¶: Returns a string representation of the object.

Returns:¶

str: A string that contains the class name followed by the values of its instance variable.

forward(inputs: List[Tensor]) → Tensor[source]¶

The forward pass performs max-pooling over the feature vectors of atoms in a neighborhood.

Parameters:

inputs (List[np.ndarray]) –
topology. (Should contain atom features and arrays describing graph) –
Returns –
------- –
torch.Tensor –

class GraphGather(batch_size: int, activation_fn: Callable | None = None, **kwargs)[source]¶

A GraphGather layer pools node-level feature vectors to create a graph feature vector.

Many graph convolutional networks manipulate feature vectors per graph-node. For a molecule for example, each node might represent an atom, and the network would manipulate atomic feature vectors that summarize the local chemistry of the atom. However, at the end of the application, we will likely want to work with a molecule level feature representation. The GraphGather layer creates a graph level feature vector by combining all the node-level feature vectors.

One subtlety about this layer is that it depends on the batch_size. This is done for internal implementation reasons. The GraphConv, and GraphPool layers pool all nodes from all graphs in a batch that’s being processed. The GraphGather reassembles these jumbled node feature vectors into per-graph feature vectors.

Example

>>> import deepchem as dc
>>> import numpy as np
>>> import deepchem.models.torch_models.layers as torch_layers
>>> batch_size = 2
>>> raw_smiles = ['CCC', 'C']
>>> from rdkit import Chem
>>> mols = [Chem.MolFromSmiles(s) for s in raw_smiles]
>>> featurizer = dc.feat.graph_features.ConvMolFeaturizer()
>>> mols = featurizer.featurize(mols)
>>> multi_mol = dc.feat.mol_graphs.ConvMol.agglomerate_mols(mols)
>>> atom_features = torch.from_numpy(multi_mol.get_atom_features().astype(np.float32))
>>> degree_slice = torch.from_numpy(multi_mol.deg_slice)
>>> membership = torch.from_numpy(multi_mol.membership)
>>> deg_adjs = [torch.from_numpy(i) for i in multi_mol.get_deg_adjacency_lists()[1:]]
>>> args = [atom_features, degree_slice, membership] + deg_adjs
>>> result = torch_layers.GraphGather(batch_size)(args)
>>> type(result)
<class 'torch.Tensor'>
>>> result.shape
torch.Size([2, 150])

References

__init__(batch_size: int, activation_fn: Callable | None = None, **kwargs)[source]¶

Initialize this layer.

Parameters:

batch_size (int) – The batch size for this layer. Note that the layer’s behavior changes depending on the batch size.
activation_fn (function) – A nonlinear activation function to apply. If you’re not sure, relu is probably a good default for your application.

get_config() → str[source]¶: Returns a string representation of the object.

Returns:¶

str: A string that contains the class name followed by the values of its instance variable.

forward(inputs: List[Tensor])[source]¶

Invoking this layer.

Parameters:

inputs (List[torch.Tensor]) – This list should consist of inputs = [atom_features, deg_slice, membership, deg_adj_list placeholders…]. These are all tensors that are created/process by GraphConv and GraphPool
Returns –
------- –
torch.Tensor –

class ClampExp(lambda_param: float = 1.0)[source]¶

A non Linearity layer that clamps the input tensor by taking the minimum of the exponential of the input multiplied by a lambda parameter and 1.

\[f(x) = min(exp(\lambda * x), 1)\]

Example

>>> import torch
>>> from deepchem.models.torch_models.flows import ClampExp
>>> lambda_param = 1.0
>>> clamp_exp = ClampExp(lambda_param)
>>> input = torch.tensor([-1 ,0.5, 0.6, 0.7])
>>> clamp_exp(input)
tensor([0.3679, 1.0000, 1.0000, 1.0000])

__init__(lambda_param: float = 1.0) → None[source]¶

Initializes the ClampExp layer

Parameters:: lambda_param (float) – Lambda parameter for the ClampExp layer

forward(x: Tensor) → Tensor[source]¶

Forward pass of the ClampExp layer

Parameters:: x (torch.Tensor) – Input tensor
Returns:: Transformed tensor according to ClampExp layer with the shape of ‘x’.
Return type:: torch.Tensor

class ConstScaleLayer(scale: float = 1.0)[source]¶

This layer scales the input tensor by a fixed factor

Example

>>> import torch
>>> from deepchem.models.torch_models.flows import ConstScaleLayer
>>> scale = 2.0
>>> const_scale = ConstScaleLayer(scale)
>>> input = torch.tensor([1, 2, 3])
>>> const_scale(input)
tensor([2., 4., 6.])

__init__(scale: float = 1.0)[source]¶

Initializes the ConstScaleLayer

Parameters:: scale (float) – Scaling factor

forward(input: Tensor) → Tensor[source]¶

Forward pass of the ConstScaleLayer

Parameters:: input (torch.Tensor) – Input tensor
Returns:: Scaled tensor
Return type:: torch.Tensor

cosine_dist(x: Tensor, y: Tensor) → Tensor[source]¶

Compute the cosine similarity (inner product) between two tensors.

Parameters:

x (torch.Tensor) – Input tensor of shape (B, N, P) representing the first set of vectors.
y (torch.Tensor) – Input tensor of shape (B, M, P) representing the second set of vectors.

Returns:

Cosine similarity tensor of shape (B, N, M) where each entry represents the cosine similarity between vectors from x and y.

Return type:

torch.Tensor

Examples

The cosine similarity between two equivalent vectors will be 1. The cosine similarity between two equivalent tensors (tensors where all the elements are the same) will be a tensor of 1s. In this scenario, if the input tensors x and y are each of shape (n,p), where each element in x and y is the same, then the output tensor would be a tensor of shape (n,n) with 1 in every entry.

>>> import deepchem as dc
>>> import numpy as np
>>> import deepchem.models.torch_models.layers as torch_layers
>>> x = torch.ones((6, 4), dtype=torch.float32)
>>> y_same = torch.ones((6, 4), dtype=torch.float32)
>>> cos_sim_same = torch_layers.cosine_dist(x, y_same)

x and y_same are the same tensor (equivalent at every element, in this case 1). As such, the pairwise inner product of the rows in x and y will always be 1. The output tensor will be of shape (6,6).

>>> diff = cos_sim_same - torch.ones((6, 6), dtype=torch.float32)
>>> np.allclose(0.0, diff.sum().item(), atol=1e-05)
True
>>> cos_sim_same.shape
torch.Size([6, 6])

The cosine similarity between two orthogonal vectors will be 0 (by definition). If every row in x is orthogonal to every row in y, then the output will be a tensor of 0s. In the following example, each row in the tensor x1 is orthogonal to each row in x2 because they are halves of an identity matrix.

>>> identity_tensor = torch.eye(512, dtype=torch.float32)
>>> x1 = identity_tensor[0:256,:]
>>> x2 = identity_tensor[256:512,:]
>>> cos_sim_orth = torch_layers.cosine_dist(x1, x2)

Each row in x1 is orthogonal to each row in x2. As such, the pairwise inner product of the rows in x1 and x2 will always be 0. Furthermore, because the shape of the input tensors are both of shape (256,512), the output tensor will be of shape (256,256).

>>> np.allclose(0.0, cos_sim_orth.sum().item(), atol=1e-05)
True
>>> cos_sim_orth.shape
torch.Size([256, 256])

class DAGLayer(n_graph_feat: int = 30, n_atom_feat: int = 75, max_atoms: int = 50, layer_sizes: List[int] = [100], init: str = 'xavier_uniform', activation: str = 'relu', dropout: float | None = None, batch_size: int = 64, device: device | None = device(type='cpu'), **kwargs: Any)[source]¶

DAG computation layer implemented in PyTorch. It is used to compute graph features for each atom with it’s neighbors recursively.

Example

>>> import numpy as np
>>> from deepchem.models.torch_models.layers import DAGLayer
>>> np.random.seed(123)
>>> batch_size, n_graph_feat, n_atom_feat, max_atoms = 10, 30, 75, 50
>>> layer_sizes = [100]
>>> layer = DAGLayer(n_graph_feat, n_atom_feat, max_atoms, layer_sizes)
>>> atom_features = np.random.rand(batch_size, n_atom_feat)
>>> parents = np.random.randint(0, max_atoms, (batch_size, max_atoms, max_atoms))
>>> calc_orders = np.random.randint(0, batch_size, (batch_size, max_atoms))
>>> calc_masks = np.random.randint(0, 2, (batch_size, max_atoms))
>>> n_atoms = batch_size
>>> outputs = layer([atom_features, parents, calc_orders, calc_masks, np.array(n_atoms)])

References

__init__(n_graph_feat: int = 30, n_atom_feat: int = 75, max_atoms: int = 50, layer_sizes: List[int] = [100], init: str = 'xavier_uniform', activation: str = 'relu', dropout: float | None = None, batch_size: int = 64, device: device | None = device(type='cpu'), **kwargs: Any) → None[source]¶

Parameters:

n_graph_feat (int, optional) – Number of features for each node(and the whole grah).
n_atom_feat (int, optional) – Number of features listed per atom.
max_atoms (int, optional) – Maximum number of atoms in molecules.
layer_sizes (list of int, optional(default=[100])) – List of hidden layer size(s): length of this list represents the number of hidden layers, and each element is the width of corresponding hidden layer.
init (str, optional) – Weight initialization for filters.
activation (str, optional) – Activation function applied.
dropout (float, optional) – Dropout probability in hidden layer(s).
batch_size (int, optional) – number of molecules in a batch.
device (str, optional) – Device used for computation

get_config() → Dict[str, Any][source]¶: Get the configuration of the DAGLayer.

Parameters:

inputs (List[Union[torch.Tensor, np.ndarray]]) – A list of tensors containing: 1. atom_features of shape (batch_size, n_atom_feat) 2. parents of shape (batch_size, max_atoms, max_atoms) 3. calculation_orders of shape (batch_size, max_atoms) 4. calculation_masks of shape (batch_size, max_atoms) 5. n_atoms (scalar value representing number of atoms)
training (bool, optional) – Whether the model is training or not, by default True

Returns:

Output feature tensor of shape (number of max_atom-th target atoms, n_outputs).

Return type:

torch.Tensor

class DAGGather(n_graph_feat: int = 30, n_outputs: int = 30, max_atoms: int = 50, layer_sizes: List[int] = [100], init: str = 'glorot_uniform', activation: str = 'relu', dropout: float | None = None, device: device | None = device(type='cpu'), **kwargs: Any)[source]¶

DAG vector gathering layer in PyTorch. It is used to gather graph features and combine them based on their membership.

Example

>>> import numpy as np
>>> from deepchem.models.torch_models.layers import DAGGather
>>> np.random.seed(123)
>>> batch_size, n_graph_feat, n_atom_feat, n_outputs = 10, 30, 30, 75
>>> max_atoms = 50
>>> layer_sizes = [100]
>>> layer = DAGGather(n_graph_feat, n_outputs, max_atoms, layer_sizes)
>>> atom_features = np.random.rand(batch_size, n_atom_feat)
>>> membership = np.sort(np.random.randint(0, batch_size, size=(batch_size)))
>>> outputs = layer([atom_features, membership])

References

__init__(n_graph_feat: int = 30, n_outputs: int = 30, max_atoms: int = 50, layer_sizes: List[int] = [100], init: str = 'glorot_uniform', activation: str = 'relu', dropout: float | None = None, device: device | None = device(type='cpu'), **kwargs: Any) → None[source]¶

Parameters:

n_graph_feat (int, optional) – Number of features for each atom.
n_outputs (int, optional) – Number of features for each molecule.
max_atoms (int, optional) – Maximum number of atoms in molecules.
layer_sizes (list of int, optional) – List of hidden layer size(s): length of this list represents the number of hidden layers, and each element is the width of corresponding hidden layer.
init (str, optional) – Weight initialization for filters.
activation (str, optional) – Activation function applied.
dropout (float, optional) – Dropout probability in the hidden layer(s).
device (str, optional) – Device used for computation

get_config() → Dict[str, Any][source]¶: Returns a dictionary containing the configuration of the layer.

forward(inputs: Tuple[Tensor | ndarray, Tensor | ndarray], training: bool = True) → Tensor[source]¶

Parameters:

inputs (List[Union[torch.Tensor, np.ndarray]]) – A list of tensors containing: 1. atom_features of shape (batch_size, n_graph_feat) 2. membership of shape (batch_size,) with sorted membership indices
training (bool, optional) – Whether the model is training or not, by default True

Returns:

Output feature tensor of shape (membership.max() + 1, n_outputs).

Return type:

torch.Tensor

class SpectralConv(in_channels: int, out_channels: int, modes: int | Tuple[int, ...] | List[int], dims: int = 2)[source]¶

n-Dimensional Fourier layer.

It applies an n-dimensional FFT on the spatial dimensions, keeps only a specified number of Fourier modes (for each spatial dimension), applies a learned complex multiplication (einsum), and returns to physical space via the inverse FFT.

Example

>>> import torch
>>> from deepchem.models.torch_models.layers import SpectralConv
>>> # Create a 2D spectral convolution layer
>>> layer = SpectralConv(in_channels=3, out_channels=16, modes=8, dims=2)
>>> # Input: batch_size=2, channels=3, height=32, width=32
>>> x = torch.randn(2, 3, 32, 32)
>>> # Apply spectral convolution
>>> output = layer(x)
>>> # Check output shape
>>> output.shape
torch.Size([2, 16, 32, 32])
>>>
>>> # Create a 1D spectral convolution layer
>>> layer_1d = SpectralConv(in_channels=4, out_channels=8, modes=10, dims=1)
>>> # Input: batch_size=3, channels=4, sequence_length=64
>>> x_1d = torch.randn(3, 4, 64)
>>> # Apply 1D spectral convolution
>>> output_1d = layer_1d(x_1d)
>>> # Check output shape
>>> output_1d.shape
torch.Size([3, 8, 64])

__init__(in_channels: int, out_channels: int, modes: int | Tuple[int, ...] | List[int], dims: int = 2)[source]¶

Parameters:

in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
modes (int or tuple of ints) – Either an int (same number of modes in every dimension) or a tuple of ints (number of modes per spatial dimension).
dims (int, default 2) – Number of spatial dimensions (typically 1, 2, or 3).

Returns:

Output tensor of shape (batch_size, out_channels, *spatial_dims). The tensor contains the result of applying spectral convolution in the Fourier domain and transforming back to the spatial domain.

Return type:

torch.Tensor

forward(x: Tensor) → Tensor[source]¶

Parameters:: x (torch.Tensor) – Input tensor of shape (batch, in_channels, *spatial_dims).
Returns:: Output tensor of shape (batch, out_channels, *spatial_dims).
Return type:: torch.Tensor

class Stem(in_channels: int, out_channels: int)[source]¶

Implements the Stem Layer as defined in https://arxiv.org/abs/1710.02238.

This layer serves as the initial processing block in ChemCeption, downsampling input images to reduce computational complexity before they pass through deeper network layers. The convolutional layer with stride 2 helps in feature extraction while reducing spatial dimensions.

Examples

>>> import numpy as np
>>> import torch
>>> from deepchem.models.torch_models.chemnet_layers import Stem
>>> in_channels = 3
>>> out_channels = 4
>>> input_tensor = np.random.rand(1, in_channels, 32, 32).astype(np.float32)  # (Batch, Channels, Height, Width)
>>> input_tensor_torch = torch.from_numpy(input_tensor)
>>> layer = Stem(in_channels, out_channels)
>>> output_tensor = layer(input_tensor_torch)
>>> output_tensor.shape
torch.Size([1, 4, 15, 15])

__init__(in_channels: int, out_channels: int) → None[source]¶

Initializes the Stem layer.

Parameters:

in_channels (int) – The number of channels in the input tensor.
out_channels (int) – The number of filters applied in the convolution operation.

forward(inputs: Tensor) → Tensor[source]¶

Forward pass of the Stem layer.

Parameters:: inputs (torch.Tensor) – Input tensor of shape (batch_size, in_channels, H, W).
Returns:: Output tensor of shape (batch_size, out_channels, H_out, W_out), where H_out and W_out are reduced due to downsampling. The output is a feature map with extracted spatial representations.
Return type:: torch.Tensor

class InceptionResnetA(in_channels: int, out_channels: int)[source]¶

Implements the Inception-ResNet-A block from the Inception-ResNet architecture as described in https://arxiv.org/abs/1710.0223.

This block combines multiple convolutional branches with varying receptive fields, concatenates their outputs, projects them back to the input dimensions using a 1x1 convolution, and adds the result to the original input (residual connection). A ReLU activation is applied at the end.

Examples

>>> import numpy as np
>>> import torch
>>> from deepchem.models.torch_models.chemnet_layers import InceptionResnetA
>>> in_channels = 64
>>> out_channels = 32
>>> input_tensor = np.random.rand(1, in_channels, 28, 28).astype(np.float32) # (Batch, Channels, Height, Width)
>>> input_tensor_torch = torch.from_numpy(input_tensor)
>>> layer = InceptionResnetA(in_channels, out_channels)
>>> output_tensor = layer(input_tensor_torch)
>>> output_tensor.shape
torch.Size([1, 64, 28, 28])

__init__(in_channels: int, out_channels: int) → None[source]¶

Initializes the Inception-ResNet-A block.

Parameters:

in_channels (int) – Number of input channels.
out_channels (int) – Number of filters in the convolutional branches.

forward(x: Tensor) → Tensor[source]¶

Forward pass of the Inception-ResNet-A block.

Parameters:: x (torch.Tensor) – Input tensor of shape (batch_size, in_channels, H, W).
Returns:: Output tensor of the same shape as input (batch_size, in_channels, H, W), after applying the Inception-ResNet-A transformations and residual connection.
Return type:: torch.Tensor

class InceptionResnetB(in_channels: int, out_channels: int)[source]¶

Implements the Inception-ResNet-B block from the Inception-ResNet architecture as described in https://arxiv.org/abs/1710.0223.

This block consists of two parallel branches: - A simple 1x1 convolution. - A deeper sequence with asymmetric convolutions (1x7 followed by 7x1) for

efficient large receptive field learning.

Outputs from both branches are concatenated and passed through a 1x1 convolution to project back to the original input dimension, and added to the input (residual connection). A ReLU activation follows.

Examples

>>> import numpy as np
>>> import torch
>>> from deepchem.models.torch_models.chemnet_layers import InceptionResnetB
>>> in_channels = 64
>>> out_channels = 32
>>> input_tensor = np.random.rand(1, in_channels, 28, 28).astype(np.float32)
>>> input_tensor_torch = torch.from_numpy(input_tensor)
>>> layer = InceptionResnetB(in_channels, out_channels)
>>> output_tensor = layer(input_tensor_torch)
>>> output_tensor.shape
torch.Size([1, 64, 28, 28])

__init__(in_channels: int, out_channels: int) → None[source]¶

Initializes the Inception-ResNet-B block.

Parameters:

in_channels (int) – Number of input channels.
out_channels (int) – Number of filters used in the convolutional branches.

forward(x: Tensor) → Tensor[source]¶

Forward pass of the Inception-ResNet-B block.

Parameters:: x (torch.Tensor) – Input tensor of shape (batch_size, in_channels, H, W)
Returns:: Output tensor of shape (batch_size, in_channels, H, W)
Return type:: torch.Tensor

class FNOBlock(width: int, modes: int | Tuple[int, ...], dims: int)[source]¶

A single Fourier Neural Operator block.

This block combines spectral convolution in Fourier space with a standard convolution to learn both global and local features.

Spectral convolution is a key component of Fourier Neural Operators (FNOs). It leverages the Fourier transform to perform convolution in the frequency domain, which allows it to capture global, long-range dependencies in the input data efficiently. The operation consists of three steps: 1. Transform the input to the frequency domain using the Fast Fourier Transform (FFT). 2. Apply a linear transformation to a truncated set of lower-frequency modes. 3. Transform the result back to the spatial domain using the Inverse FFT.

By operating in the frequency domain, spectral convolutions can learn global patterns without the large kernels and deep architectures required by traditional CNNs, in contrast to the local convolutions used in CNNs.

This is because each Fourier mode represents a sinusoidal function that spans the entire spatial domain (meaning the entire input). Their coefficients in the frequency domain contain information about the overall structure of the input. By manipulating the coefficients of these modes in the frequency domain, the spectral convolution can model relationships between distant points in the input, effectively capturing global dependencies.

The forward pass computes: FNO_block(x) = ReLU(SpectralConv(x) + Conv(x))

Example

>>> import torch
>>> from deepchem.models.torch_models.fno import FNOBlock
>>> block = FNOBlock(width=128, modes=8, dims=2)
>>> x = torch.randn(1, 128, 16, 16)
>>> output = block(x)

__init__(width: int, modes: int | Tuple[int, ...], dims: int) → None[source]¶

Initialize the FNO block.

Parameters:

width (int) – Number of channels/features in the block
modes (int or tuple) – Number of Fourier modes to keep in spectral convolution
dims (int) – Spatial dimensionality (1, 2, or 3)

forward(x: Tensor) → Tensor[source]¶

Forward pass through the FNO block.

Parameters:: x (torch.Tensor) – Input tensor of shape (batch, width, *spatial_dims)
Returns:: Output tensor of same shape as input after spectral and local convolution
Return type:: torch.Tensor

class InceptionResnetC(in_channels: int, out_channels: int)[source]¶

Implements the Inception-ResNet-C block from the Inception-ResNet architecture as described in https://arxiv.org/pdf/1706.06689.

This block consists of two parallel branches: - A simple 1x1 convolution. - A deeper sequence of 1x1 → 1x3 → 3x1 asymmetric convolutions, which expands

the receptive field efficiently while maintaining fewer parameters.

The outputs from both branches are concatenated, passed through a 1x1 convolution to project the result back to the input dimensionality, and then added to the original input via a residual connection. A ReLU activation follows the addition.

Examples

>>> import numpy as np
>>> import torch
>>> from deepchem.models.torch_models.chemnet_layers import InceptionResnetC
>>> in_channels = 64
>>> out_channels = 32
>>> input_tensor = np.random.rand(1, in_channels, 28, 28).astype(np.float32)
>>> input_tensor_torch = torch.from_numpy(input_tensor)
>>> layer = InceptionResnetC(in_channels, out_channels)
>>> output_tensor = layer(input_tensor_torch)
>>> output_tensor.shape
torch.Size([1, 64, 28, 28])

__init__(in_channels: int, out_channels: int) → None[source]¶

Initializes the Inception-ResNet-C block.

Parameters:

in_channels (int) – Number of input channels.
out_channels (int) – Number of filters used in the convolutional branches.

forward(x: Tensor) → Tensor[source]¶

Forward pass of the Inception-ResNet-C block.

Parameters:: x (torch.Tensor) – Input tensor of shape (batch_size, in_channels, H, W)
Returns:: Output tensor of shape (batch_size, in_channels, H, W)
Return type:: torch.Tensor

class ReductionA(in_channels: int, out_channels: int)[source]¶

Implements the Reduction-A block from the Inception-ResNet architecture as described in https://arxiv.org/pdf/1706.06689.

This block reduces the spatial dimensions of the input tensor while increasing the number of feature channels. It consists of three parallel branches: - A max pooling layer. - A single 3x3 convolution with stride 2. - A sequence of 1x1 → 3x3 → 3x3 convolutions, ending with stride 2.

The outputs from all branches are concatenated along the channel dimension.

Examples

>>> import numpy as np
>>> import torch
>>> from deepchem.models.torch_models.chemnet_layers import ReductionA
>>> in_channels = 3
>>> out_channels = 32
>>> input_tensor = np.random.rand(1, in_channels, 4, 4).astype(np.float32)
>>> input_tensor_torch = torch.from_numpy(input_tensor)
>>> layer = ReductionA(in_channels, out_channels)
>>> output_tensor = layer(input_tensor_torch)
>>> output_tensor.shape
torch.Size([1, 99, 1, 1])

__init__(in_channels: int, out_channels: int) → None[source]¶

Initializes the Reduction-A block.

Parameters:

in_channels (int) – Number of input channels.
out_channels (int) – Base number of output filters used in the convolutional branches.

forward(x: Tensor) → Tensor[source]¶

Forward pass of the Reduction-A block.

Parameters:: x (torch.Tensor) – Input tensor of shape (batch_size, in_channels, H, W)
Returns:: Output tensor with reduced spatial dimensions and increased channels.
Return type:: torch.Tensor

class ReductionB(in_channels: int, out_channels: int)[source]¶

Implements the Reduction-B block from the Inception-ResNet architecture as described in https://arxiv.org/pdf/1706.06689.

This block aggressively reduces the spatial dimensions while using multiple convolutional branches to preserve rich feature representations.

The four branches are: - Max pooling with stride 2. - 1x1 → 3x3 convolution (stride 2). - Another 1x1 → 3x3 convolution (stride 2) with different filter scaling. - A deeper asymmetric 1x1 → 3x1 → 3x3 convolution path (stride 2).

The outputs are concatenated along the channel dimension and passed through a final ReLU.

Examples

>>> import numpy as np
>>> import torch
>>> from deepchem.models.torch_models.chemnet_layers import ReductionB
>>> in_channels = 3
>>> out_channels = 32
>>> input_tensor = np.random.rand(1, in_channels, 4, 4).astype(np.float32)
>>> input_tensor_torch = torch.from_numpy(input_tensor)
>>> layer = ReductionB(in_channels, out_channels)
>>> output_tensor = layer(input_tensor_torch)
>>> output_tensor.shape
torch.Size([1, 127, 1, 1])

__init__(in_channels: int, out_channels: int) → None[source]¶

Initializes the Reduction-B block.

Parameters:

in_channels (int) – Number of input channels.
out_channels (int) – Base number of output filters for each convolutional branch.

forward(x: Tensor) → Tensor[source]¶

Forward pass of the Reduction-B block.

Parameters:: x (torch.Tensor) – Input tensor of shape (batch_size, in_channels, H, W)
Returns:: Output tensor with reduced spatial dimensions and increased channels.
Return type:: torch.Tensor

Flow Layers¶

class Flow[source]¶

Generic class for flow functions

Flows [flow1] should satisfy several conditions in order to be practical. They should:

be invertible; for sampling we need g while for computing likelihood we need f ,
be sufficiently expressive to model the distribution of interest,
be computationally efficient, both in terms of computing f and g (depending on the application) but also in terms of the calculation of the determinant of the Jacobian.

Flow layers are generally used as a part of a Normalizing Flow model, which is a generative model that learns a target distribution by transforming a simple base distribution through a series of invertible transformations. The target distribution is then defined as the composition of the base distribution and the flow transformations.

References

[flow1]

Kobyzev, I., Prince, S. J., & Brubaker, M. A. (2020). Normalizing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence, 43(11), 3964-3979.

__init__()[source]¶: Initializes the flow function

forward(z: Tensor) → Tuple[Tensor, Tensor][source]¶

Forward pass of the flow

Parameters:

z (torch.Tensor) – Input tensor

Returns:

z_ (torch.Tensor) – Transformed tensor
log_det (torch.Tensor) – Logarithm of the determinant of the Jacobian of the transformation

inverse(z: Tensor) → Tuple[Tensor, Tensor][source]¶

Inverse pass of the flow

Parameters:

z (torch.Tensor) – Input tensor

Returns:

z_ (torch.Tensor) – Transformed tensor
log_det (torch.Tensor) – Logarithm of the determinant of the Jacobian of the transformation

class Affine(dim: int)[source]¶

Class which performs the Affine transformation.

This transformation is based on the affinity of the base distribution with the target distribution. A geometric transformation is applied where the parameters performs changes on the scale and shift of a function (inputs).

Normalizing Flow transformations must be bijective in order to compute the logarithm of jacobian’s determinant. For this reason, transformations must perform a forward and inverse pass.

Example

>>> import deepchem as dc
>>> from deepchem.models.torch_models.layers import Affine
>>> import torch
>>> from torch.distributions import MultivariateNormal
>>> # initialize the transformation layer's parameters
>>> dim = 2
>>> samples = 96
>>> transforms = Affine(dim)
>>> # forward pass based on a given distribution
>>> distribution = MultivariateNormal(torch.zeros(dim), torch.eye(dim))
>>> input = distribution.sample(torch.Size((samples, dim)))
>>> len(transforms.forward(input))
2
>>> # inverse pass based on a distribution
>>> len(transforms.inverse(input))
2

__init__(dim: int) → None[source]¶

Create a Affine transform layer.

Parameters:: dim (int) – Value of the Nth dimension of the dataset.

forward(x: Tensor) → Tuple[Tensor, Tensor][source]¶

Performs a transformation between two different distributions. This particular transformation represents the following function:

\[y = x * exp(a) + b\]

where a is scale parameter and b performs a shift. This class also returns the logarithm of the jacobians determinant which is useful when invert a transformation and compute the probability of the transformation.

Parameters:

x (torch.Tensor) – Tensor sample with the initial distribution data which will pass into the normalizing flow algorithm.

Returns:

y (torch.Tensor) – Transformed tensor according to Affine layer with the shape of ‘x’.
log_det_jacobian (torch.Tensor) – Tensor which represents the info about the deviation of the initial and target distribution.

inverse(y: Tensor) → Tuple[Tensor, Tensor][source]¶

Performs a transformation between two different distributions. This transformation represents the bacward pass of the function mention before. Its mathematical representation is x = (y - b) / exp(a) , where “a” is scale parameter and “b” performs a shift. This class also returns the logarithm of the jacobians determinant which is useful when invert a transformation and compute the probability of the transformation.

Parameters:

y (torch.Tensor) – Tensor sample with transformed distribution data which will be used in the normalizing algorithm inverse pass.

Returns:

x (torch.Tensor) – Transformed tensor according to Affine layer with the shape of ‘y’.
inverse_log_det_jacobian (torch.Tensor) – Tensor which represents the information of the deviation of the initial and target distribution.

This class implements the Masked Affine Flow layer

The Masked Affine Flow [maskedaffine1] layer is a type of normalizing flow layer which is used to learn a target distribution. The layer is based on the affine flow layer, but with a mask applied to the input data. The mask is a tensor of the same size as the input data, filled with 0s and 1s. The mask is used to determine which features are transformed by the affine flow layer. The affine flow layer is defined as follows:

Masked affine flow .. math:: f(z) = b * z + (1 - b) * (z * e^{s(b * z)} + t)

Example

>>> import torch
>>> import torch.nn as nn
>>> import torch.nn.functional as F
>>> from deepchem.models.torch_models.flows import MaskedAffineFlow
>>> from torch.distributions import MultivariateNormal

>>> dim = 2
>>> samples = 96
>>> data = MultivariateNormal(torch.zeros(dim), torch.eye(dim))
>>> tensor = data.sample(torch.Size((samples, dim)))

>>> layers = 4
>>> hidden_size = 16
>>> masks = F.one_hot(torch.tensor([i % 2 for i in range(layers)])).float()

>>> s_func = nn.Sequential(
...     nn.Linear(in_features=dim, out_features=hidden_size), nn.LeakyReLU(),
...     nn.Linear(in_features=hidden_size, out_features=hidden_size),
...     nn.LeakyReLU(), nn.Linear(in_features=hidden_size, out_features=dim))

>>> t_func = nn.Sequential(
...     nn.Linear(in_features=dim, out_features=hidden_size), nn.LeakyReLU(),
...     nn.Linear(in_features=hidden_size, out_features=hidden_size),
...     nn.LeakyReLU(), nn.Linear(in_features=hidden_size, out_features=dim))

>>> layers = nn.ModuleList(
...     [MaskedAffineFlow(mask, s_func, t_func) for mask in masks])

>>> for layer in layers:
...   _, inverse_log_det_jacobian = layer.inverse(tensor)
...   inverse_log_det_jacobian = inverse_log_det_jacobian.detach().numpy()
>>> len(inverse_log_det_jacobian)
96

References

[maskedaffine1]

Dinh, L., Sohl-Dickstein, J., & Bengio, S. (2016). Density estimation using real nvp. arXiv preprint arXiv:1605.08803.

Initializes the Masked Affine Flow layer

Parameters:

b (torch.Tensor) – mask for features, i.e. tensor of same size as latent data point filled with 0s and 1s
t (Optional[Union[torch.nn.ModuleList, torch.nn.Sequential, torch.nn.Module]], optional) – translation mapping, i.e. neural network, where first input dimension is batch dim, if None no translation is applied
s (Optional[Union[torch.nn.ModuleList, torch.nn.Sequential, torch.nn.Module]], optional) – scale mapping, i.e. neural network, where first input dimension is batch dim, if None no scale is applied

forward(z: Tensor) → Tuple[Tensor, Tensor][source]¶

Forward pass of the Masked Affine Flow layer

Parameters:

z (torch.Tensor) – Input tensor

Returns:

z (torch.Tensor) – Transformed tensor according to Masked Affine Flow layer with the shape of ‘z’.
log_det (torch.Tensor) – Tensor which represents the information of the deviation of the initial and target distribution.

inverse(z: Tensor) → Tuple[Tensor, Tensor][source]¶

Inverse pass of the Masked Affine Flow layer

Parameters:

z (torch.Tensor) – Input tensor

Returns:

z_ (torch.Tensor) – Transformed tensor according to Masked Affine Flow layer with the shape of ‘z’.
log_det (torch.Tensor) – Tensor which represents the information of the deviation of the initial and target distribution.

class ActNorm(*args, **kwargs)[source]¶

This class implements the ActNorm layer (for activation normalizaton)

ActNorm is an Affine layer but with a data-dependent initialization, where on the very first batch we clever initialize the scale,shift so that the output is unit gaussian. As described in [glow] Kingma et al (2018).

ActNorm is a layer that performs an affine transformation of the activations using a scale and bias parameter per channel, similar to batch normalization. These parameters are initialized such that the post-actnorm activations per-channel have zero mean and unit variance given an initial minibatch of data. This is a form of data dependent initialization [weight_norm]. After initialization, the scale and bias are treated as regular trainable parameters that are independent of the data.

Examples

Importing necessary libraries

>>> import torch
>>> import torch.nn as nn
>>> import torch.nn.functional as F
>>> from deepchem.models.torch_models.flows import MaskedAffineFlow
>>> from torch.distributions import MultivariateNormal

Creating sample data

>>> dim = 2
>>> samples = 96
>>> data = MultivariateNormal(torch.zeros(dim), torch.eye(dim))
>>> tensor = data.sample(torch.Size((samples, dim)))

Initializing the ActNorm layer and performing forward and inverse pass

>>> actnorm = ActNorm(dim)
>>> _, log_det_jacobian = actnorm.forward(tensor)
>>> _, inverse_log_det_jacobian = actnorm.inverse(tensor)
>>> len(inverse_log_det_jacobian)
96

References

[glow]

Kingma, D. P., & Dhariwal, P. (2018). Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 31.

[weight_norm]

Salimans, T., & Kingma, D. P. (2016). Weight normalization: A simple reparameterization to accelerate training of deep neural networks. Advances in neural information processing systems, 29.

__init__(*args, **kwargs) → None[source]¶: Initializes the ActNorm layer

forward(z: Tensor) → Tuple[Tensor, Tensor][source]¶

Forward pass of the ActNorm layer

Parameters:

z (torch.Tensor) – Input tensor

Returns:

z_ (torch.Tensor) – Transformed tensor according to ActNorm layer with the shape of ‘z’.
log_det (torch.Tensor) – Tensor which represents the information of the deviation of the initial and target distribution.

inverse(z: Tensor) → Tuple[Tensor, Tensor][source]¶

Inverse pass of the ActNorm layer

Parameters:

z (torch.Tensor) – Input tensor

Returns:

z_ (torch.Tensor) – Transformed tensor according to ActNorm layer with the shape of ‘z’.
log_det (torch.Tensor) – Tensor which represents the information of the deviation of the initial and target distribution.

class ClampExp(lambda_param: float = 1.0)[source]¶

A non Linearity layer that clamps the input tensor by taking the minimum of the exponential of the input multiplied by a lambda parameter and 1.

\[f(x) = min(exp(\lambda * x), 1)\]

Example

>>> import torch
>>> from deepchem.models.torch_models.flows import ClampExp
>>> lambda_param = 1.0
>>> clamp_exp = ClampExp(lambda_param)
>>> input = torch.tensor([-1 ,0.5, 0.6, 0.7])
>>> clamp_exp(input)
tensor([0.3679, 1.0000, 1.0000, 1.0000])

__init__(lambda_param: float = 1.0) → None[source]¶

Initializes the ClampExp layer

Parameters:: lambda_param (float) – Lambda parameter for the ClampExp layer

forward(x: Tensor) → Tensor[source]¶

Forward pass of the ClampExp layer

Parameters:: x (torch.Tensor) – Input tensor
Returns:: Transformed tensor according to ClampExp layer with the shape of ‘x’.
Return type:: torch.Tensor

class ConstScaleLayer(scale: float = 1.0)[source]¶

This layer scales the input tensor by a fixed factor

Example

>>> import torch
>>> from deepchem.models.torch_models.flows import ConstScaleLayer
>>> scale = 2.0
>>> const_scale = ConstScaleLayer(scale)
>>> input = torch.tensor([1, 2, 3])
>>> const_scale(input)
tensor([2., 4., 6.])

__init__(scale: float = 1.0)[source]¶

Initializes the ConstScaleLayer

Parameters:: scale (float) – Scaling factor

forward(input: Tensor) → Tensor[source]¶

Forward pass of the ConstScaleLayer

Parameters:: input (torch.Tensor) – Input tensor
Returns:: Scaled tensor
Return type:: torch.Tensor

class MLPFlow(layers: list, leaky: float = 0.0, score_scale: float | None = None, output_fn=None, output_scale: float | None = None, init_zeros: bool = False, dropout: float | None = None)[source]¶

A Multi-Layer Perceptron (MLP) model for normalizing flows that is used as a part of a Normalizing Flow model. It is a modified version of the MLP model from deepchem/deepchem/models/torch_models/layers.py to handle multiple layers

Example

>>> import torch
>>> from deepchem.models.torch_models.flows import MLPFlow
>>> layers = [2, 4, 4, 2]
>>> mlpflow = MLPFlow(layers)
>>> input = torch.tensor([1., 2.])
>>> output = mlpflow(input)
>>> output.shape
torch.Size([2])

__init__(layers: list, leaky: float = 0.0, score_scale: float | None = None, output_fn=None, output_scale: float | None = None, init_zeros: bool = False, dropout: float | None = None)[source]¶

Initializes the MLPFlow model

Parameters:

layers (list) – List of layer sizes from start to end
leaky (float, optional default 0.0) – Slope of the leaky part of the ReLU, if 0.0, standard ReLU is used
score_scale (float, optional) – Factor to apply to the scores, i.e. output before output_fn
output_fn (str, optional) – Function to be applied to the output, either None, “sigmoid”, “relu”, “tanh”, or “clampexp”
output_scale (float, optional) – Rescale outputs if output_fn is specified, i.e. scale * output_fn(out / scale)
init_zeros (bool, optional) – Flag, if true, weights and biases of last layer are initialized with zeros (helpful for deep models, see arXiv 1807.03039)
dropout (float, optional) – If specified, dropout is done before last layer; if None, no dropout is done

forward(x: Tensor) → Tensor[source]¶

Forward pass of the MLPFlow model

Parameters:: x (torch.Tensor) – Input tensor
Returns:: Transformed tensor according to the MLPFlow model with the shape of ‘x’
Return type:: torch.Tensor

class NormalizingFlow(transform: Sequence, base_distribution: Distribution, dim: int)[source]¶

Normalizing flows are widely used to perform generative models. This algorithm gives advantages over variational autoencoders (VAE) because of ease in sampling by applying invertible transformations (Frey, Gadepally, & Ramsundar, 2022).

Example

>>> import deepchem as dc
>>> from deepchem.models.torch_models.flows import Affine
>>> from deepchem.models.torch_models.normalizing_flows_pytorch import NormalizingFlow
>>> import torch
>>> from torch.distributions import MultivariateNormal
>>> # initialize the transformation layer's parameters
>>> dim = 2
>>> samples = 96
>>> transforms = [Affine(dim)]
>>> distribution = MultivariateNormal(torch.zeros(dim), torch.eye(dim))
>>> # initialize normalizing flow model
>>> model = NormalizingFlow(transforms, distribution, dim)
>>> # evaluate the log_prob when applying the transformation layers
>>> input = distribution.sample(torch.Size((samples, dim)))
>>> len(model.log_prob(input))
96
>>> # evaluates the the sampling method and its log_prob
>>> len(model.sample(samples))
2

__init__(transform: Sequence, base_distribution: Distribution, dim: int) → None[source]¶

This class considers a transformation, or a composition of transformations functions (layers), between a base distribution and a target distribution.

Parameters:

transform (Sequence) – Bijective transformation/transformations which are considered the layers of a Normalizing Flow model.
base_distribution (torch.Tensor) – Probability distribution to initialize the algorithm. The Multivariate Normal distribution is mainly used for this parameter.
dim (int) – Value of the Nth dimension of the dataset.

log_prob(inputs: Tensor) → Tensor[source]¶

This method computes the probability of the inputs when transformation/transformations are applied.

Parameters:: inputs (torch.Tensor) – Tensor used to evaluate the log_prob computation of the learned distribution. shape: (samples, dim)
Returns:: log_prob – This tensor contains the value of the log probability computed. shape: (samples)
Return type:: torch.Tensor

sample(n_samples: int) → Tuple[Tensor, Tensor][source]¶

Performs a sampling from the transformed distribution. Besides the outputs (sampling), this method returns the logarithm of probability to obtain the outputs at the base distribution.

Parameters:: n_samples (int) – Number of samples to select from the transformed distribution
Returns:: sample – This tuple contains a two torch.Tensor objects. The first represents a sampling of the learned distribution when transformations had been applied. The secong torc.Tensor is the computation of log probabilities of the transformed distribution. shape: ((samples, dim), (samples))
Return type:: tuple

Grover Layers¶

The following layers are used for implementing GROVER model as described in the paper <Self-Supervised Graph Transformer on Large-Scale Molecular Data <https://drug.ai.tencent.com/publications/GROVER.pdf>_

class GroverMPNEncoder(atom_messages: bool, init_message_dim: int, hidden_size: int, depth: int, undirected: bool, attach_feats: bool, attached_feat_fdim: int = 0, bias: bool = True, dropout: float = 0.2, activation: str = 'relu', input_layer: str = 'fc', dynamic_depth: str = 'none')[source]¶

Performs Message Passing to generate encodings for the molecule.

Parameters:

atom_messages (bool) – True if encoding atom-messages else False.
init_message_dim (int) – Dimension of embedding message.
attach_feats (bool) – Set to True if additional features are passed along with node/edge embeddings.
attached_feat_fdim (int) – Dimension of additional features when attach_feats is True
undirected (bool) – If set to True, the graph is considered as an undirected graph.
depth (int) – number of hops in a message passing iteration
dynamic_depth (str, default: none) – If set to uniform for randomly sampling dynamic depth from an uniform distribution else if set to truncnorm, dynamic depth is sampled from a truncated normal distribution.
input_layer (str) – If set to fc, adds an initial feed-forward layer. If set to none, does not add an initial feed forward layer.

__init__(atom_messages: bool, init_message_dim: int, hidden_size: int, depth: int, undirected: bool, attach_feats: bool, attached_feat_fdim: int = 0, bias: bool = True, dropout: float = 0.2, activation: str = 'relu', input_layer: str = 'fc', dynamic_depth: str = 'none')[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(init_messages, init_attached_features, a2nei, a2attached, b2a=None, b2revb=None, adjs=None) → FloatTensor[source]¶

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class GroverAttentionHead(hidden_size: int = 128, bias: bool = True, depth: int = 1, dropout: float = 0.0, undirected: bool = False, atom_messages: bool = False)[source]¶

Generates attention head using GroverMPNEncoder for generating query, key and value

Parameters:

hidden_size (int) – Dimension of hidden layer
undirected (bool) – If set to True, the graph is considered as an undirected graph.
depth (int) – number of hops in a message passing iteration
atom_messages (bool) – True if encoding atom-messages else False.

__init__(hidden_size: int = 128, bias: bool = True, depth: int = 1, dropout: float = 0.0, undirected: bool = False, atom_messages: bool = False)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(f_atoms, f_bonds, a2b, a2a, b2a, b2revb)[source]¶

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class GroverMTBlock(atom_messages: bool, input_dim: int, num_heads: int, depth: int, undirected: bool = False, hidden_size: int = 128, dropout: float = 0.0, bias: bool = True, res_connection: bool = True, activation: str = 'relu')[source]¶

Message passing combined with transformer architecture

The layer combines message passing performed using GroverMPNEncoder and uses it to generate query, key and value for multi-headed Attention block.

Parameters:

atom_messages (bool) – True if encoding atom-messages else False.
input_dim (int) – Dimension of input features
num_heads (int) – Number of attention heads
depth (int) – Number of hops in a message passing iteration
undirected (bool) – If set to True, the graph is considered as an undirected graph.

__init__(atom_messages: bool, input_dim: int, num_heads: int, depth: int, undirected: bool = False, hidden_size: int = 128, dropout: float = 0.0, bias: bool = True, res_connection: bool = True, activation: str = 'relu')[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(batch)[source]¶

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class GroverTransEncoder(node_fdim: int, edge_fdim: int, depth: int = 3, undirected: bool = False, num_mt_block: int = 2, num_heads: int = 2, hidden_size: int = 64, dropout: float = 0.2, res_connection: bool = True, bias: bool = True, activation: str = 'relu')[source]¶

GroverTransEncoder for encoding a molecular graph

The GroverTransEncoder layer is used for encoding a molecular graph. The layer returns 4 outputs. They are atom messages aggregated from atom hidden states, atom messages aggregated from bond hidden states, bond messages aggregated from atom hidden states, bond messages aggregated from bond hidden states.

Parameters:

hidden_size (int) – the hidden size of the model.
edge_fdim (int) – the dimension of additional feature for edge/bond.
node_fdim (int) – the dimension of additional feature for node/atom.
depth (int) – Dynamic message passing depth for use in MPNEncoder
undirected (bool) – The message passing is undirected or not
dropout (float) – the dropout ratio
activation (str) – the activation function
num_mt_block (int) – the number of mt block.
num_head (int) – the number of attention AttentionHead.
bias (bool) – enable bias term in all linear layers.
res_connection (bool) – enables the skip-connection in MTBlock.

__init__(node_fdim: int, edge_fdim: int, depth: int = 3, undirected: bool = False, num_mt_block: int = 2, num_heads: int = 2, hidden_size: int = 64, dropout: float = 0.2, res_connection: bool = True, bias: bool = True, activation: str = 'relu')[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(batch)[source]¶

Forward layer

Parameters:: batch (Tuple) – A tuple of tensors representing grover attributes
Returns:: embeddings – Embeddings for atom generated from hidden state of nodes and bonds and embeddings of bond generated from hidden states of nodes and bond.
Return type:: Tuple[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor]]

class GroverEmbedding(node_fdim, edge_fdim, hidden_size=128, depth=1, undirected=False, dropout=0.2, activation='relu', num_mt_block=1, num_heads=4, bias=False, res_connection=False)[source]¶

GroverEmbedding layer.

This layer is a simple wrapper over GroverTransEncoder layer for retrieving the embeddings from the GroverTransEncoder layer.

Parameters:

edge_fdim (int) – the dimension of additional feature for edge/bond.
node_fdim (int) – the dimension of additional feature for node/atom.
depth (int) – Dynamic message passing depth for use in MPNEncoder
undirected (bool) – The message passing is undirected or not
num_mt_block (int) – the number of message passing blocks.
num_head (int) – the number of attention heads.

__init__(node_fdim, edge_fdim, hidden_size=128, depth=1, undirected=False, dropout=0.2, activation='relu', num_mt_block=1, num_heads=4, bias=False, res_connection=False)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(graph_batch: List[Tensor])[source]¶

Forward function

Parameters:: graph_batch (List[torch.Tensor]) – A list containing f_atoms, f_bonds, a2b, b2a, b2revb, a_scope, b_scope, a2a
Returns:: embedding – Returns a dictionary of embeddings. The embeddings are: - atom_from_atom: node messages aggregated from node hidden states - bond_from_atom: bond messages aggregated from bond hidden states - atom_from_bond: node message aggregated from bond hidden states - bond_from_bond: bond messages aggregated from bond hidden states.
Return type:: Dict[str, torch.Tensor]

class GroverEmbedding(node_fdim, edge_fdim, hidden_size=128, depth=1, undirected=False, dropout=0.2, activation='relu', num_mt_block=1, num_heads=4, bias=False, res_connection=False)[source]¶

GroverEmbedding layer.

This layer is a simple wrapper over GroverTransEncoder layer for retrieving the embeddings from the GroverTransEncoder layer.

Parameters:

edge_fdim (int) – the dimension of additional feature for edge/bond.
node_fdim (int) – the dimension of additional feature for node/atom.
depth (int) – Dynamic message passing depth for use in MPNEncoder
undirected (bool) – The message passing is undirected or not
num_mt_block (int) – the number of message passing blocks.
num_head (int) – the number of attention heads.

__init__(node_fdim, edge_fdim, hidden_size=128, depth=1, undirected=False, dropout=0.2, activation='relu', num_mt_block=1, num_heads=4, bias=False, res_connection=False)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(graph_batch: List[Tensor])[source]¶

Forward function

Parameters:: graph_batch (List[torch.Tensor]) – A list containing f_atoms, f_bonds, a2b, b2a, b2revb, a_scope, b_scope, a2a
Returns:: embedding – Returns a dictionary of embeddings. The embeddings are: - atom_from_atom: node messages aggregated from node hidden states - bond_from_atom: bond messages aggregated from bond hidden states - atom_from_bond: node message aggregated from bond hidden states - bond_from_bond: bond messages aggregated from bond hidden states.
Return type:: Dict[str, torch.Tensor]

class GroverAtomVocabPredictor(vocab_size: int, in_features: int = 128)[source]¶

Grover Atom Vocabulary Prediction Module.

The GroverAtomVocabPredictor module is used for predicting atom-vocabulary for the self-supervision task in Grover architecture. In the self-supervision tasks, one task is to learn contextual-information of nodes (atoms). Contextual information are encoded as strings, like C_N-DOUBLE1_O-SINGLE1. The module accepts an atom encoding and learns to predict the contextual information of the atom as a multi-class classification problem.

Example

>>> from deepchem.models.torch_models.grover_layers import GroverAtomVocabPredictor
>>> num_atoms, in_features, vocab_size = 30, 16, 10
>>> layer = GroverAtomVocabPredictor(vocab_size, in_features)
>>> embedding = torch.randn(num_atoms, in_features)
>>> result = layer(embedding)
>>> result.shape
torch.Size([30, 10])

Reference¶

__init__(vocab_size: int, in_features: int = 128)[source]¶

Initializing Grover Atom Vocabulary Predictor

Parameters:

vocab_size (int) – size of vocabulary (vocabulary here is the total number of different possible contexts)
in_features (int) – feature size of atom embeddings.

forward(embeddings)[source]¶

Parameters:: embeddings (torch.Tensor) – the atom embeddings of shape (vocab_size, in_features)
Returns:: logits – the prediction for each atom of shape (num_bond, vocab_size)
Return type:: torch.Tensor

class GroverBondVocabPredictor(vocab_size: int, in_features: int = 128)[source]¶

Layer for learning contextual information for bonds.

The layer is used in Grover architecture to learn contextual information of a bond by predicting the context of a bond from the bond embedding in a multi-class classification setting. The contextual information of a bond are encoded as strings (ex: ‘(DOUBLE-STEREONONE-NONE)_C-(SINGLE-STEREONONE-NONE)2’).

Example

>>> from deepchem.models.torch_models.grover_layers import GroverBondVocabPredictor
>>> num_bonds = 20
>>> in_features, vocab_size = 16, 10
>>> layer = GroverBondVocabPredictor(vocab_size, in_features)
>>> embedding = torch.randn(num_bonds * 2, in_features)
>>> result = layer(embedding)
>>> result.shape
torch.Size([20, 10])

Reference¶

__init__(vocab_size: int, in_features: int = 128)[source]¶

Initializes GroverBondVocabPredictor

Parameters:

vocab_size (int) – Size of vocabulary, used for number of classes in prediction.
in_features (int, default: 128) – Input feature size of bond embeddings.

forward(embeddings)[source]¶

Parameters:: embeddings (torch.Tensor) – bond embeddings of shape (num_bond, in_features)
Returns:: logits – the prediction for each bond, (num_bond, vocab_size)
Return type:: torch.Tensor

class GroverFunctionalGroupPredictor(functional_group_size: int, in_features=128)[source]¶

The functional group prediction task for self-supervised learning.

Molecules have functional groups in them. This module is used for predicting the functional group and the problem is formulated as an multi-label classification problem.

Parameters:

functional_group_size (int,) – size of functional group
in_features (int,) – hidden_layer size, default 128

Example

>>> from deepchem.models.torch_models.grover_layers import GroverFunctionalGroupPredictor
>>> in_features, functional_group_size = 8, 20
>>> num_atoms, num_bonds = 10, 20
>>> predictor = GroverFunctionalGroupPredictor(functional_group_size=20, in_features=8)
>>> atom_scope, bond_scope = [(0, 3), (3, 3), (6, 4)], [(0, 5), (5, 4), (9, 11)]
>>> embeddings = {}
>>> embeddings['bond_from_atom'] = torch.randn(num_bonds, in_features)
>>> embeddings['bond_from_bond'] = torch.randn(num_bonds, in_features)
>>> embeddings['atom_from_atom'] = torch.randn(num_atoms, in_features)
>>> embeddings['atom_from_bond'] = torch.randn(num_atoms, in_features)
>>> result = predictor(embeddings, atom_scope, bond_scope)

Reference¶

__init__(functional_group_size: int, in_features=128)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(embeddings: Dict, atom_scope: List, bond_scope: List)[source]¶

The forward function for the GroverFunctionalGroupPredictor (semantic motif prediction) layer. It takes atom/bond embeddings produced from node and bond hidden states from GroverEmbedding module and the atom, bond scopes and produces prediction logits for different each embedding. The scopes are used to differentiate atoms/bonds belonging to a molecule in a batched molecular graph.

Parameters:

embedding (Dict) – The input embeddings organized as an dictionary. The input embeddings are output of GroverEmbedding layer.
atom_scope (List) – The scope for atoms.
bond_scope (List) – The scope for bonds

Returns:

preds (Dict) – A dictionary containing the predicted logits of functional group from four different types of input embeddings. The key and their corresponding predictions
are described below. –
- atom_from_atom - prediction logits from atom embeddings generated via node hidden states
- atom_from_bond - prediction logits from atom embeddings generated via bond hidden states
- bond_from_atom - prediction logits from bond embeddings generated via node hidden states
- bond_from_bond - prediction logits from bond embeddings generated via bond hidden states

class GroverPretrain(embedding: Module, atom_vocab_task_atom: Module, atom_vocab_task_bond: Module, bond_vocab_task_atom: Module, bond_vocab_task_bond: Module, functional_group_predictor: Module)[source]¶

The Grover Pretrain module.

The GroverPretrain module is used for training an embedding based on the Grover Pretraining task. Grover pretraining is a self-supervised task where an embedding is trained to learn the contextual information of atoms and bonds along with graph-level properties, which are functional groups in case of molecular graphs.

Parameters:

embedding (nn.Module) – An embedding layer to generate embedding from input molecular graph
atom_vocab_task_atom (nn.Module) – A layer used for predicting atom vocabulary from atom features generated via atom hidden states.
atom_vocab_task_bond (nn.Module) – A layer used for predicting atom vocabulary from atom features generated via bond hidden states.
bond_vocab_task_atom (nn.Module) – A layer used for predicting bond vocabulary from bond features generated via atom hidden states.
bond_vocab_task_bond (nn.Module) – A layer used for predicting bond vocabulary from bond features generated via bond hidden states.

Returns:

prediction_logits (Tuple) – A tuple of prediction logits containing prediction logits of atom vocabulary task from atom hidden state,
prediction logits for atom vocabulary task from bond hidden states, prediction logits for bond vocabulary task
from atom hidden states, prediction logits for bond vocabulary task from bond hidden states, functional
group prediction logits from atom embedding generated from atom and bond hidden states, functional group
prediction logits from bond embedding generated from atom and bond hidden states.

Example

>>> import deepchem as dc
>>> from deepchem.utils.grover import BatchGroverGraph
>>> from deepchem.models.torch_models.grover import GroverPretrain
>>> from deepchem.models.torch_models.grover_layers import GroverEmbedding, GroverAtomVocabPredictor, GroverBondVocabPredictor, GroverFunctionalGroupPredictor
>>> smiles = ['CC', 'CCC', 'CC(=O)C']

>>> fg = dc.feat.CircularFingerprint()
>>> featurizer = dc.feat.GroverFeaturizer(features_generator=fg)

>>> graphs = featurizer.featurize(smiles)
>>> batched_graph = BatchGroverGraph(graphs)
>>> grover_graph_attributes = batched_graph.get_components()
>>> f_atoms, f_bonds, a2b, b2a, b2revb, a2a, a_scope, b_scope, _ = grover_graph_attributes
>>> components = {}
>>> components['embedding'] = GroverEmbedding(node_fdim=f_atoms.shape[1], edge_fdim=f_bonds.shape[1])
>>> components['atom_vocab_task_atom'] = GroverAtomVocabPredictor(vocab_size=10, in_features=128)
>>> components['atom_vocab_task_bond'] = GroverAtomVocabPredictor(vocab_size=10, in_features=128)
>>> components['bond_vocab_task_atom'] = GroverBondVocabPredictor(vocab_size=10, in_features=128)
>>> components['bond_vocab_task_bond'] = GroverBondVocabPredictor(vocab_size=10, in_features=128)
>>> components['functional_group_predictor'] = GroverFunctionalGroupPredictor(10)
>>> model = GroverPretrain(**components)

>>> inputs = f_atoms, f_bonds, a2b, b2a, b2revb, a_scope, b_scope, a2a
>>> output = model(inputs)

Reference¶

__init__(embedding: Module, atom_vocab_task_atom: Module, atom_vocab_task_bond: Module, bond_vocab_task_atom: Module, bond_vocab_task_bond: Module, functional_group_predictor: Module)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(graph_batch)[source]¶

Forward function

Parameters:: graph_batch (List[torch.Tensor]) – A list containing grover graph attributes

class GroverFinetune(embedding: Module, readout: Module, mol_atom_from_atom_ffn: Module, mol_atom_from_bond_ffn: Module, hidden_size: int = 128, mode: str = 'regression', n_tasks: int = 1, n_classes: int | None = None)[source]¶

Grover Finetune model.

For a graph level prediction task, the GroverFinetune model uses node/edge embeddings output by the GroverEmbeddong layer and applies a readout function on it to get graph embeddings and use additional MLP layers to predict the property of the molecular graph.

Parameters:

embedding (nn.Module) – An embedding layer to generate embedding from input molecular graph
readout (nn.Module) – A readout layer to perform readout atom and bond hidden states
mol_atom_from_atom_ffn (nn.Module) – A feed forward network which learns representation from atom messages generated via atom hidden states of a molecular graph
mol_atom_from_bond_ffn (nn.Module) – A feed forward network which learns representation from atom messages generated via bond hidden states of a molecular graph
mode (str) – classification or regression

Returns:

prediction_logits – prediction logits

Return type:

torch.Tensor

Example

>>> import deepchem as dc
>>> from deepchem.utils.grover import BatchGroverGraph
>>> from deepchem.models.torch_models.grover_layers import GroverEmbedding
>>> from deepchem.models.torch_models.readout import GroverReadout
>>> from deepchem.models.torch_models.grover import GroverFinetune
>>> smiles = ['CC', 'CCC', 'CC(=O)C']
>>> fg = dc.feat.CircularFingerprint()
>>> featurizer = dc.feat.GroverFeaturizer(features_generator=fg)
>>> graphs = featurizer.featurize(smiles)
>>> batched_graph = BatchGroverGraph(graphs)
>>> attributes = batched_graph.get_components()
>>> components = {}
>>> additional_features = batched_graph.additional_features
>>> f_atoms, f_bonds, a2b, b2a, b2revb, a2a, a_scope, b_scope, fg_labels = attributes
>>> inputs = f_atoms, f_bonds, a2b, b2a, b2revb, a_scope, b_scope, a2a
>>> components = {}
>>> components['embedding'] = GroverEmbedding(node_fdim=f_atoms.shape[1], edge_fdim=f_bonds.shape[1])
>>> components['readout'] = GroverReadout(rtype="mean", in_features=128)
>>> components['mol_atom_from_atom_ffn'] = nn.Linear(in_features=additional_features.shape[1]+ 128, out_features=128)
>>> components['mol_atom_from_bond_ffn'] = nn.Linear(in_features=additional_features.shape[1] + 128, out_features=128)
>>> model = GroverFinetune(**components, mode='regression', hidden_size=128)
>>> model.training = False
>>> output = model((inputs, additional_features))

Reference¶

__init__(embedding: Module, readout: Module, mol_atom_from_atom_ffn: Module, mol_atom_from_bond_ffn: Module, hidden_size: int = 128, mode: str = 'regression', n_tasks: int = 1, n_classes: int | None = None)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(inputs)[source]¶

Parameters:: inputs (Tuple) – grover batch graph attributes

Attention Layers¶

class ScaledDotProductAttention[source]¶

The Scaled Dot Production Attention operation from Attention Is All You Need <https://arxiv.org/abs/1706.03762>_ paper.

Example

>>> from deepchem.models import ScaledDotProductAttention as SDPA
>>> attn = SDPA()
>>> x = torch.ones(1, 5)
>>> # Linear layers for making query, key, value
>>> Q, K, V = nn.Parameter(torch.ones(5)), nn.Parameter(torch.ones(5)), nn.Parameter(torch.ones(5))
>>> query, key, value = Q * x, K * x, V * x
>>> x_out, attn_score = attn(query, key, value)

__init__()[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(query: Tensor, key: Tensor, value: Tensor, mask: Tensor | None = None, dropout: Dropout | None = None)[source]¶

Parameters:

query (torch.Tensor) – Query tensor for attention
key (torch.Tensor) – Key tensor for attention
value (torch.Tensor) – Value tensor for attention
mask (torch.Tensor (optional)) – Mask to apply during attention computation
dropout (nn.Dropout (optional)) – Dropout layer for attention output

class SelfAttention(in_features, out_features, hidden_size=128)[source]¶

SelfAttention Layer

Given $Xin mathbb{R}^{n imes in_feature}$, the attention is calculated by: $a=softmax(W_2tanh(W_1X))$, where $W_1 in mathbb{R}^{hidden imes in_feature}$, $W_2 in mathbb{R}^{out_feature imes hidden}$. The final output is $y=aX$ where $y in mathbb{R}^{n imes out_feature}$.

Parameters:

in_features (int) – Dimension of input features
out_features (int) – Dimension of output features
hidden_size (int) – Dimension of hidden layer

__init__(in_features, out_features, hidden_size=128)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(X)[source]¶

The forward function.

Parameters:

X (torch.Tensor) – input feature of shape $mathbb{R}^{n imes in_feature}$.

Returns:

embedding (torch.Tensor) – The final embedding of shape $mathbb{R}^{out_features imes in_feature}$
attention-matrix (torch.Tensor) – The attention matrix

class SphericalHarmonics(max_degree: int)[source]¶

Custom computation of spherical harmonics up to a specified degree.

Spherical harmonics are implemented to capture rotationally equivariant features based on interatomic relative positions.

Parameters:: max_degree (int) – Maximum degree of the spherical harmonics.

Example

>>> sh = SphericalHarmonics(max_degree=2)
>>> relative_positions = torch.randn(3, 5, 5, 3)
>>> result = sh.compute(relative_positions)
>>> result.shape
torch.Size([3, 5, 5, 9])

__init__(max_degree: int) → None[source]¶

Initialize the custom spherical harmonics calculator.

Parameters:: max_degree (int) – Maximum degree of the spherical harmonics.

compute_legendre_polynomials(l: int, m: int, x: Tensor) → Tensor[source]¶

Compute the associated Legendre polynomial.

Parameters:

l (int) – Degree of the polynomial.
m (int) – Order of the polynomial.
x (torch.Tensor) – Input tensor.

Returns:

Computed Legendre polynomial values.

Return type:

torch.Tensor

Example

>>> sh = SphericalHarmonics(max_degree=2)
>>> x = torch.tensor(0.5)
>>> sh.compute_legendre_polynomials(1, 0, x)
tensor(0.5000)

compute_spherical_harmonics(l: int, m: int, theta: Tensor, phi: Tensor) → Tensor[source]¶

Compute the spherical harmonics Y_l^m(theta, phi).

Parameters:

l (int) – Degree of the spherical harmonics.
m (int) – Order of the spherical harmonics.
theta (torch.Tensor) – Polar angles in radians.
phi (torch.Tensor) – Azimuthal angles in radians.

Returns:

Spherical harmonics values.

Return type:

torch.Tensor

Example

>>> sh = SphericalHarmonics(max_degree=2)
>>> theta = torch.tensor(0.5)
>>> phi = torch.tensor(1.0)
>>> sh.compute_spherical_harmonics(1, 0, theta, phi)
tensor(0.4288+0.j)

compute(relative_positions: Tensor) → Tensor[source]¶

Compute all spherical harmonics for relative positions.

Parameters:: relative_positions (torch.Tensor) – Tensor of shape (B, N, N, 3) representing relative positions.
Returns:: Spherical harmonics tensor of shape (B, N, N, SH_dim).
Return type:: torch.Tensor

Example

>>> sh = SphericalHarmonics(max_degree=1)
>>> rel_positions = torch.randn(1, 3, 3, 3)
>>> sh.compute(rel_positions).shape
torch.Size([1, 3, 3, 4])

class SE3Attention(embed_dim: int, num_heads: int, sh_max_degree: int = 2)[source]¶

SE(3) Attention Module with Spherical Harmonics.

This module is designed for 3D atomic or molecular data, using spherical harmonics to compute rotationally equivariant attention based on interatomic distances and relative positions. It ensures SE(3)-equivariance for both feature and coordinate updates.

Parameters:

embed_dim (int) – Dimensionality of feature embeddings.
num_heads (int) – Number of attention heads.
sh_max_degree (int) – Maximum degree of spherical harmonics.

Example

>>> layer = SE3Attention(embed_dim=64, num_heads=4, sh_max_degree=2)
>>> x = torch.randn(1, 10, 64)  # Default dtype torch.float32
>>> coords = torch.randn(1, 10, 3)  # Default dtype torch.float32
>>> features, coords = layer(x, coords)
>>> features.shape, coords.shape
(torch.Size([1, 10, 64]), torch.Size([1, 10, 3]))

__init__(embed_dim: int, num_heads: int, sh_max_degree: int = 2) → None[source]¶

Initialize the SE(3) Attention Module.

Parameters:

embed_dim (int) – Dimensionality of feature embeddings.
num_heads (int) – Number of attention heads.
sh_max_degree (int, optional) – Maximum degree of spherical harmonics. Default is 2.

compute_spherical_harmonics(coords: Tensor) → Tuple[Tensor, Tensor][source]¶

Compute distances and spherical harmonics for relative positions.

Parameters:: coords (torch.Tensor) – Input coordinates tensor of shape (B, N, 3).
Returns:: Pairwise distances of shape (B, N, N, 1) and spherical harmonics of shape (B, N, N, SH_dim).
Return type:: Tuple[torch.Tensor, torch.Tensor]

Example

>>> coords = torch.randn(1, 10, 3)
>>> layer = SE3Attention(embed_dim=64, num_heads=4, sh_max_degree=2)
>>> dist, sh = layer.compute_spherical_harmonics(coords)
>>> dist.shape, sh.shape
(torch.Size([1, 10, 10, 1]), torch.Size([1, 10, 10, 16]))

forward(x: Tensor, coords: Tensor) → Tuple[Tensor, Tensor][source]¶

Perform attention computation and coordinate updates.

Parameters:

x (torch.Tensor) – Input feature tensor of shape (B, N, embed_dim).
coords (torch.Tensor) – Input coordinate tensor of shape (B, N, 3).

Returns:

Updated feature tensor of shape (B, N, embed_dim) and updated coordinate tensor of shape (B, N, 3).

Return type:

Tuple[torch.Tensor, torch.Tensor]

Example

>>> layer = SE3Attention(embed_dim=64, num_heads=4, sh_max_degree=2)
>>> x = torch.randn(1, 10, 64)
>>> coords = torch.randn(1, 10, 3)
>>> features, coords = layer(x, coords)
>>> features.shape, coords.shape
(torch.Size([1, 10, 64]), torch.Size([1, 10, 3]))

class Fiber(num_degrees: int | None = None, num_channels: int | None = None, structure: List[Tuple[int, int]] | None = None, dictionary: Dict[int, int] | None = None)[source]¶

Data Structure for Fibers in SE(3)-Transformers.

Fibers represent structured feature spaces used in equivariant neural networks, particularly in SE(3)-Transformer models. This class provides utilities for defining, manipulating, and combining fiber structures.

Example

>>> from deepchem.models.torch_models.layers import Fiber
>>> fiber1 = Fiber(num_degrees=3, num_channels=16)
>>> fiber2 = Fiber(dictionary={0: 16, 1: 8, 2: 4})
>>> combined_fiber = Fiber.combine(fiber1, fiber2)
>>> combined_fiber.structure
[(32, 0), (24, 1), (20, 2)]
>>> combined_fiber.multiplicities
(32, 24, 20)

References

__init__(num_degrees: int | None = None, num_channels: int | None = None, structure: List[Tuple[int, int]] | None = None, dictionary: Dict[int, int] | None = None) → None[source]¶

Initialize a Fiber structure.

Parameters:

num_degrees (int, optional) – Maximum degree of fiber representation.
num_channels (int, optional) – Number of channels per degree.
structure (List[Tuple[int, int]], optional) – Custom fiber structure as (num_channels, degree) pairs.
dictionary (dict, optional) – Dictionary representation {degree: num_channels}.

static combine(f1: Fiber, f2: Fiber) → Fiber[source]¶

This method takes two Fiber instances and merges their structures by adding the number of channels (multiplicities) for degrees that appear in both fibers.

Parameters:

f1 (Fiber) – First fiber to combine.
f2 (Fiber) – Second fiber to combine.

Returns:

A new fiber with combined structure.

Return type:

Fiber

Example

>>> from deepchem.models.torch_models.layers import Fiber
>>> fiber1 = Fiber(dictionary={0: 16, 1: 8})
>>> fiber2 = Fiber(dictionary={1: 8, 2: 4})
>>> combined = Fiber.combine(fiber1, fiber2)
>>> combined.structure
[(16, 0), (16, 1), (4, 2)]
>>> combined.multiplicities
(16, 16, 4)

static combine_max(f1: Fiber, f2: Fiber) → Fiber[source]¶

This method merges two Fiber instances by taking the maximum number of channels (multiplicities) for degrees that appear in both fibers.

Parameters:

f1 (Fiber) – First fiber to combine.
f2 (Fiber) – Second fiber to combine.

Returns:

A new fiber with maximum multiplicities for each degree.

Return type:

Fiber

Example

>>> from deepchem.models.torch_models.layers import Fiber
>>> fiber1 = Fiber(dictionary={0: 16, 1: 8})
>>> fiber2 = Fiber(dictionary={1: 12, 2: 4})
>>> combined_max = Fiber.combine_max(fiber1, fiber2)
>>> combined_max.structure
[(16, 0), (12, 1), (4, 2)]
>>> combined_max.multiplicities
(16, 12, 4)

class SE3LayerNorm(num_channels: int, **kwargs: Any)[source]¶

SE(3)-equivariant layer normalization.

Layer Normalization is applied to SE(3)-equivariant atomic features. Unlike batch normalization, which normalizes across the batch dimension, LayerNorm normalizes each feature channel independently. This makes it suitable for graph-based and transformer architectures where batch statistics are not stable. Layer normalization ensures that each feature maintains zero mean and unit variance, improving training stability and preserving SE(3) equivariance, since normalization is applied per feature rather than per batch.

Example

>>> import torch
>>> from deepchem.models.torch_models.layers import SE3LayerNorm
>>> batch_size, num_channels = 10, 30
>>> layer = SE3LayerNorm(num_channels)
>>> x = torch.randn(batch_size, num_channels)
>>> output = layer(x)
>>> output.shape
torch.Size([10, 30])

References

__init__(num_channels: int, **kwargs: Any) → None[source]¶

Parameters:: num_channels (int) – Number of output channels for normalization.

forward(x: Tensor) → Tensor[source]¶

Apply Layer Normalization. :param x: Input tensor of shape (…, num_channels) :type x: torch.Tensor

Returns:: Normalized tensor with the same shape as input.
Return type:: torch.Tensor

class SE3RadialFunc(num_freq: int, in_dim: int, out_dim: int, edge_dim: int = 0)[source]¶

Defines the radial profile used in SE(3)-equivariant kernels. The radial function serves as a filter that modulates the interaction strength between features based on their relative distance. It transforms edge features while preserving the angular components, ensuring SE(3) equivariance. The function is implemented using a fully connected network (MLP) with multiple linear layers, which allows the model to learn flexible representations of distance-based interactions. ReLU activations introduce non-linearity, enabling the network to capture complex relationships. To improve training stability and feature scaling, BN layer is applied after intermediate transformations. The final output is then projected into the spherical harmonics basis, ensuring that the learned transformations remain SE(3)-equivariant and can be effectively combined with angular basis functions for SE(3)-equivariant message passing.

Example

>>> import torch
>>> from deepchem.models.torch_models.layers import SE3RadialFunc
>>> num_freq, in_dim, out_dim, edge_dim = 5, 10, 15, 4
>>> layer = SE3RadialFunc(num_freq, in_dim, out_dim, edge_dim)
>>> x = torch.randn(8, edge_dim + 1)
>>> output = layer(x)
>>> output.shape
torch.Size([8, 15, 1, 10, 1, 5])

References

__init__(num_freq: int, in_dim: int, out_dim: int, edge_dim: int = 0) → None[source]¶

Parameters:

num_freq (int) – Number of frequency components.
in_dim (int) – Input feature dimension.
out_dim (int) – Output feature dimension.
edge_dim (int, optional) – Number of edge dimensions (default is 0).

forward(x: Tensor) → Tensor[source]¶

Forward pass of the RadialFunc layer.

Parameters:: x (torch.Tensor) – Input tensor of shape (…, edge_dim + 1)
Returns:: Output tensor of shape (-1, out_dim, 1, in_dim, 1, num_freq)
Return type:: torch.Tensor

class SE3PairwiseConv(degree_in: int, nc_in: int, degree_out: int, nc_out: int, edge_dim: int = 0)[source]¶

SE(3)-equivariant convolution between two single-type features. This layer implements a learnable convolution operation that preserves SE(3) equivariance by operating on pairwise interactions using a basis defined by spherical harmonics. Instead of standard convolution (which is translation-invariant), this operation ensures equivariance to rotations and translations in 3D space. The convolution is defined using SE(3)-equivariant kernels, where the coefficients are learned via RadialFunc. The kernel operates on pairwise feature interactions, ensuring that feature transformations respect the underlying geometric symmetries of the data. This is achieved by decomposing interactions into radial and angular components: - The radial component (distance-dependent) is learned via RadialFunc, capturing

how feature strength varies with distance.

The angular component (orientation-dependent) is handled via a spherical harmonics basis, ensuring that rotations affect the output in a structured manner.

Example

>>> from rdkit import Chem
>>> import dgl
>>> import deepchem as dc
>>> import shutil
>>> import os
>>> from deepchem.models.torch_models.layers import SE3PairwiseConv
>>> from deepchem.utils.equivariance_utils import get_spherical_from_cartesian, precompute_sh, basis_transformation_Q_J
>>> mol = Chem.MolFromSmiles('CCO')
>>> featurizer = dc.feat.EquivariantGraphFeaturizer(fully_connected=False, embeded=True)
>>> features = featurizer.featurize([mol])[0]
>>> G = features.to_dgl_graph()
>>> G.edata['w'] = torch.tensor(features.edge_weights, dtype=torch.float32)
>>> max_degree = 3
>>> distances = G.edata['edge_attr']
>>> r_ij = get_spherical_from_cartesian(distances)
>>> Y = precompute_sh(r_ij, 2*max_degree)
>>> basis = {}
>>> # Compute SE(3) basis for different (d_in, d_out) degrees
>>> for d_in in range(max_degree + 1):
...     for d_out in range(max_degree + 1):
...         K_Js = []
...         for J in range(abs(d_in - d_out), d_in + d_out + 1):
...             # Get spherical harmonic projection matrices
...             Q_J = basis_transformation_Q_J(J, d_in, d_out)
...             Q_J = Q_J.float().T
...             # Create kernel from spherical harmonics
...             K_J = torch.matmul(Y[J], Q_J)
...             K_Js.append(K_J)
...         # Reshape so can take linear combinations with a dot product
...         size = (-1, 1, 2 * d_out + 1, 1, 2 * d_in + 1, 2 * min(d_in, d_out) + 1)
...         basis[f"{d_in},{d_out}"] = torch.stack(K_Js, -1).view(*size)
>>> # Compute radial distances
>>> r = torch.sqrt(torch.sum(distances**2, -1, keepdim=True))
>>> # Add edge features
>>> if "w" in G.edata.keys():
...     w = G.edata["w"]
...     feat = torch.cat([w, r], -1)
... else:
...     feat = torch.cat([r], -1)
>>> pairwise_conv = SE3PairwiseConv(degree_in=0, nc_in=32, degree_out=0, nc_out=128, edge_dim=4)
>>> output = pairwise_conv(feat, basis)
>>> output.shape
torch.Size([4, 128, 32])
>>> dir_path = "cache"
>>> if os.path.exists(dir_path) and os.path.isdir(dir_path):
...     shutil.rmtree(dir_path)

References

__init__(degree_in: int, nc_in: int, degree_out: int, nc_out: int, edge_dim: int = 0) → None[source]¶

Parameters:

degree_in (int) – Degree of the input feature.
nc_in (int) – Number of channels in the input feature.
degree_out (int) – Degree of the output feature.
nc_out (int) – Number of channels in the output feature.
edge_dim (int, optional) – Number of edge dimensions, default is 0.

forward(feat: Tensor, basis: dict) → Tensor[source]¶

Forward pass of the PairwiseConv layer. :param feat: Input tensor of shape (…, edge_dim + 1). :type feat: torch.Tensor :param basis: Dictionary containing basis functions with keys formatted as ‘degree_in, degree_out’. :type basis: dict

Returns:: Convolved tensor of shape (batch_size, d_out * nc_out, -1).
Return type:: torch.Tensor

class SE3Sum(f_x: Fiber, f_y: Fiber)[source]¶

SE(3)-Equivariant Graph Residual Sum Function (SE3Sum).

This layer performs element-wise summation of SE(3)-equivariant node features. It enables skip connections by summing residual features in SE(3)-Transformers**.

Given two feature representations: - x: SE(3)-equivariant feature tensor. - y: Another SE(3)-equivariant feature tensor.

This layer computes: [ h_{out}[d] = h_x[d] + h_y[d] ] - The summation is performed separately for each degree d.

If the number of feature channels differs, zero-padding is applied.

Example

>>> import torch
>>> from deepchem.models.torch_models.layers import Fiber, SE3Sum
>>> # Define Fiber Representations
>>> # Scalars (0) & vectors (1)
>>> f_x = Fiber(dictionary={0: 16, 1: 32})  # Scalars (0) & vectors (1)
>>> f_y = Fiber(dictionary={0: 16, 1: 32})
>>> # Initialize SE(3)-Equivariant Summation Layer
>>> se3_sum = SE3Sum(f_x, f_y)
>>> # Create Random Feature Inputs
>>> x = {'0': torch.randn(10, 16, 1), '1': torch.randn(10, 32, 3)}
>>> y = {'0': torch.randn(10, 16, 1), '1': torch.randn(10, 32, 3)}
>>> # Apply `SE3Sum` Layer
>>> output = se3_sum(x, y)
>>> for key, tensor in output.items():
...     print(tensor.shape)
torch.Size([10, 16, 1])
torch.Size([10, 32, 3])

__init__(f_x: Fiber, f_y: Fiber)[source]¶

Initializes the SE(3)-equivariant summation layer.

Parameters:

f_x (Fiber) – structure for the first input.
f_y (Fiber) – structure for the second input.

forward(x: Dict[str, Tensor], y: Dict[str, Tensor]) → Dict[str, Tensor][source]¶

Forward pass of the residual summation layer.

Parameters:

x (Dict[str, torch.Tensor]) – Input feature dictionary (first tensor).
y (Dict[str, torch.Tensor]) – Input feature dictionary (second tensor).

Returns:

out – Summed feature dictionary.

Return type:

Dict[str, torch.Tensor]

class SE3Cat(f_x: Fiber, f_y: Fiber)[source]¶

SE(3)-Equivariant Graph Feature Concatenation (SE3Cat).

This layer concatenates features from two SE(3)-equivariant fiber representations. Unlike SE3Sum, which adds feature tensors, SE3Cat stacks them along the channel dimension. This operation is useful for combining features from different representations while preserving SE(3) equivariance.

Given two feature tensors: - x[d]: Input feature tensor of degree d. - y[d]: Another input feature tensor of degree d.

This layer computes: [ h_{out}[d] = ext{Concat} left( h_x[d], h_y[d]

ight)

] The concatenation is performed separately for each degree d. If a feature degree exists only in x (but not in y), x is used. Concatenation is performed only for degrees in f_x.

>>> import torch
>>> from deepchem.models.torch_models.layers import Fiber, SE3Cat
>>> # Define Fiber Representations
>>> # Scalars (0) & vectors (1)
>>> f_x = Fiber(dictionary={0: 16, 1: 32})
>>> f_y = Fiber(dictionary={0: 16, 1: 32})
>>> # Initialize SE(3)-Equivariant Concatenation Layer
>>> se3cat = SE3Cat(f_x, f_y)
>>> # Create Random Feature Inputs
>>> x = {'0': torch.randn(4, 10, 16, 1), '1': torch.randn(4, 10, 32, 3)}
>>> y = {'0': torch.randn(4, 10, 16, 1), '1': torch.randn(4, 10, 32, 3)}
>>> # Apply `SE3Cat` Layer
>>> output = se3cat(x, y)
>>> for key, tensor in output.items():
...     print(tensor.shape)
torch.Size([4, 20, 16, 1])
torch.Size([4, 20, 32, 3])

__init__(f_x: Fiber, f_y: Fiber)[source]¶

Initializes the SE(3)-equivariant concatenation layer.

Parameters:

f_x – Fiber structure for the first input.
f_y – Fiber structure for the second input.

forward(x: Dict[str, Tensor], y: Dict[str, Tensor]) → Dict[str, Tensor][source]¶

Forward pass of the concatenation layer.

Parameters:

x (Dict[str, torch.Tensor]) – Input feature dictionary (first tensor).
y (Dict[str, torch.Tensor]) – Input feature dictionary (second tensor).
Returns –
output (Dict[str, torch.Tensor]) – Concatenated feature dictionary.

class SE3AvgPooling(pooling_type: str = '0')[source]¶

SE(3)-Equivariant Graph Average Pooling Module (SE3AvgPooling).

This layer **performs average pooling over graph nodes while preserving SE(3) equivariance.

Given a set of node features ( h_i ) over a graph ( G ), the average pooling operation computes:

[ h_{ ext{out}} =

rac{1}{|V|} sum_{i in V} h_i

]

where ( V ) is the set of nodes in the graph.

For SE(3)-equivariant features, this layer performs: - Degree 0 (scalars): Standard average pooling. - Degree 1 (vectors): Applies average pooling component-wise.

>>> import torch
>>> import dgl
>>> from deepchem.models.torch_models.layers import SE3AvgPooling, Fiber
>>> # Create a DGL Graph
>>> G = dgl.graph(([0, 1, 2], [3, 4, 5]), num_nodes=6)
>>> # Define Node Features
>>> features = {
...     '0': torch.randn(6, 16, 1),  # Scalar features (Degree 0)
...     '1': torch.randn(6, 32, 3)   # Vector features (Degree 1)
... }
>>>
>>> # Initialize SE(3)-Equivariant Average Pooling Layer
>>> pool_0 = SE3AvgPooling(pooling_type='0')  # For scalars
>>> pool_1 = SE3AvgPooling(pooling_type='1')  # For vectors
>>> # Apply Pooling
>>> pooled_0 = pool_0(features, G)
>>> pooled_1 = pool_1(features, G)
>>> print(pooled_0.shape)
torch.Size([1, 16])
>>> print(pooled_1['1'].shape)
torch.Size([1, 32, 3])

__init__(pooling_type: str = '0')[source]¶

Initializes the SE(3)-equivariant average pooling layer.

Parameters:

(str) (type) –

‘0’: Applies standard average pooling for scalar (degree 0) features.
’1’: Applies component-wise average pooling for vector (degree 1) features.

forward(features: Dict[str, Tensor], G, **kwargs) → Tensor | Dict[str, Tensor][source]¶

Forward pass of SE(3)-equivariant graph pooling.

Parameters:

features (Dict[str, torch.Tensor]) – Node features dictionary.
G (dgl.DGLGraph) – DGL graph structure.

Returns:

Union[torch.Tensor, Dict[str, torch.Tensor]]

Return type:

Pooled features

class SE3MultiHeadAttention(f_value: Fiber, f_key: Fiber, n_heads: int)[source]¶

SE(3)-Equivariant Multi-Headed Self-Attention for Graph Neural Networks. This layer extends multi-head self-attention (MHA) to SE(3)-equivariant representations. Instead of using dot-product attention in standard Transformers, this module computes SE(3)-equivariant attention scores over graph edges while preserving SE(3) symmetry. Given a graph G = (V, E), where: - Nodes store queries (q_i ∈ R^m). - Edges store keys (k_ij ∈ R^m) & values (v_ij ∈ R^m). The attention weight is computed using a dot product between the key and the query followed by a softmax operation: $$ a_{ij} = ext{softmax} left(

rac{k_{ij} cdot q_i}{sqrt{d_k}} ight)

$$ The final node update aggregates weighted values: $$ h_i = sum_{j in mathcal{N}(i)} a_{ij} v_{ij} $$ SE(3) Equivariance: - The above equations are applied separately for each degree (l) in the SE(3)

representation, ensuring equivariance is preserved across scalar (l=0), vector (l=1), and higher-degree features.

Usage in SE(3)-Transformers:

This layer replaces MHA in SE(3)-Transformers by applying equivariant self-attention over graph edges.
Used inside GSE3Res blocks for message passing.
Preserves SE(3) symmetry when computing interactions.

>>> import torch
>>> import dgl
>>> import deepchem as dc
>>> from rdkit import Chem
>>> from deepchem.models.torch_models.layers import Fiber, SE3MultiHeadAttention
>>> # Create a molecular graph from SMILES
>>> mol = Chem.MolFromSmiles("CCO")
>>> # Extract SE(3)-equivariant features
>>> featurizer = dc.feat.EquivariantGraphFeaturizer(fully_connected=False, embeded=True)
>>> features = featurizer.featurize([mol])[0]
>>> #  Convert features into a DGL graph
>>> G = features.to_dgl_graph()
>>> G.edata['w'] = torch.tensor(features.edge_weights, dtype=torch.float32)  # Edge weights
>>> #  Define Fiber Representations for Input & Output
>>> f_value = Fiber(dictionary={0: 16, 1: 32})  # Scalars (degree 0) & Vectors (degree 1)
>>> f_key = Fiber(dictionary={0: 16, 1: 32})  # Same as values for attention
>>> # Initialize `SE3MultiHeadAttention` (Multi-Headed SE(3)-Equivariant Attention)
>>> n_heads = 4
>>> gmab = SE3MultiHeadAttention(f_value, f_key, n_heads)
>>> # Convert Node Features into the Correct Format
>>> v = {str(d): torch.randn(G.num_edges(), f_value.structure_dict[d], 2 * d + 1) for d in f_value.structure_dict}
>>> k = {str(d): torch.randn(G.num_edges(), f_key.structure_dict[d], 2 * d + 1) for d in f_key.structure_dict}
>>> q = {str(d): torch.randn(G.num_nodes(), f_key.structure_dict[d], 2 * d + 1) for d in f_key.structure_dict}
>>> # Apply `SE3MultiHeadAttention` Layer (SE(3)-Equivariant Attention)
>>> output = gmab(v, k=k, q=q, G=G)
>>> for key, tensor in output.items():
...    print(tensor.shape)
torch.Size([3, 16, 1])
torch.Size([3, 32, 3])

__init__(f_value: Fiber, f_key: Fiber, n_heads: int) → None[source]¶

Initialize the SE(3)-equivariant multi-head self-attention layer.

Parameters:

f_value (Fiber) – Fiber representation of the values in attention.
f_key (Fiber) – Fiber representation of the keys in attention.
n_heads (int) – Number of attention heads.

udf_u_mul_e(d_out: int) → Callable[source]¶

Compute the weighted sum for a single output feature type.

This function applies the attention weights to the value features during message passing in the graph.

Parameters:: d_out (int) – Degree of the output representation in SE(3) space.
Returns:: A function to be used in DGL’s message-passing framework.
Return type:: function

forward(v: Dict[Any, Tensor], k: Dict[Any, Tensor] | None, q: Dict[Any, Tensor] | None, G, **kwargs) → Dict[str, Tensor][source]¶

Forward pass of SE(3)-equivariant multi-headed attention.

Applies equivariant self-attention over graph edges, computing attention scores and updating node features accordingly.

Parameters:

v (Dict) – Dictionary of value tensors indexed by their degree.
k (Dict, optional) – Dictionary of key tensors indexed by their degree (default: None).
q (Dict, optional) – Dictionary of query tensors indexed by their degree (default: None).
G (dgl.DGLGraph) – Graph object containing node and edge features.

Returns:

Dictionary of updated node features indexed by their degree.

Return type:

Dict

class SE3AttentiveSelfInteraction(f_in: Fiber, f_out: Fiber)[source]¶

A self-interaction layer with an attention mechanism that preserves SE(3) equivariance. This layer applies learnable self-interactions using attention weights to dynamically control the contribution of different feature components. It enables adaptive transformation of SE(3)-equivariant features while preserving equivariance. Instead of applying a fixed transformation to each feature type, this layer computes attention weights based on input feature relationships. These weights determine how much each feature contributes to the output. Mathematically, this can be expressed as:

h_out[d] = softmax(W_d * (h_in[d] ⊗ h_in[d])) * h_in[d]

where: - h_in[d] is the input feature tensor of degree d. - W_d is the learnable attention weight matrix for degree d. - ⊗ represents an inner product to compute self-attention scalars. - h_out[d] is the transformed output feature tensor. This method enables a data-dependent feature transformation, which is particularly useful for learning complex interactions in SE(3)-equivariant models. :param f_in: The input Fiber structure that defines feature multiplicities and degrees. :type f_in: Any :param f_out: The output Fiber structure that defines transformed feature dimensions. :type f_out: Any

Example

>>> from rdkit import Chem
>>> import dgl
>>> import torch
>>> import deepchem as dc
>>> from deepchem.models.torch_models.layers import Fiber, SE3AttentiveSelfInteraction
>>> # Create a molecular graph from SMILES
>>> mol = Chem.MolFromSmiles("CCO")
>>> # Use EquivariantGraphFeaturizer to extract SE(3)-equivariant features
>>> featurizer = dc.feat.EquivariantGraphFeaturizer(fully_connected=False, embeded=True)
>>> features = featurizer.featurize([mol])[0]
>>> G = features.to_dgl_graph()  # Convert features into a DGL graph
>>> f_in = Fiber(dictionary={0: 16, 1: 32})  # Input fiber: scalars & vectors
>>> f_out = Fiber(dictionary={0: 32, 1: 64})  # Output fiber
>>> # Initialize SE(3)-Equivariant Attentive Self-Interaction Layer
>>> g_att = SE3AttentiveSelfInteraction(f_in, f_out)
>>> # Convert Node Features into the Correct Format
>>> h = {str(d): torch.randn(G.num_nodes(), f_in.structure_dict[d], 2 * d + 1) for d in f_in.structure_dict}
>>> # Apply `SE3AttentiveSelfInteraction` Layer (SE(3)-Equivariant Self-Interaction with Attention)
>>> output = g_att(h)
>>> print(output['0'].shape)
torch.Size([3, 32, 1])
>>> print(output['1'].shape)
torch.Size([3, 64, 3])

References

__init__(f_in: Fiber, f_out: Fiber) → None[source]¶: Initializes the SE3AttentiveSelfInteraction layer. :param f_in (Fiber): The input fiber structure defining feature multiplicities and degrees. :param f_out (Fiber): The output fiber structure defining transformed feature dimensions.

forward(features: Dict[str, Tensor], **kwargs: Any) → Dict[str, Tensor][source]¶

Forward pass of the SE(3)-equivariant attentive self-interaction layer. This function applies self-attention to modify each feature type using dynamically learned weights. The transformation for each degree d follows:

Compute feature-wise scalar attention weights.

Normalize features using softmax attention.

Apply the learned attention weights to the input.

Parameters:: features (Dict[str, torch.Tensor]) – Dictionary containing input node features for each feature type.
Returns:: Dictionary containing transformed node features for each feature type.
Return type:: Dict[str, torch.Tensor]

class SE3SelfInteraction(f_in: Fiber, f_out: Fiber, learnable: bool = True)[source]¶

A linear SE(3)-equivariant layer, equivalent to a 1x1 convolution. This layer applies independent linear transformations to each feature type while preserving SE(3) equivariance. It acts as a self-interaction layer by linearly transforming node features without aggregating information from neighbors. Mathematically, this operation is similar to a fully connected transformation, but applied independently to each feature type (degree) in the SE(3)-equivariant representation. This is equivalent to a self-interaction layer in Tensor Field Networks (TFN), where each SE(3)-invariant or equivariant feature undergoes a learned transformation.

Example

>>> from rdkit import Chem
>>> import dgl
>>> import torch
>>> import deepchem as dc
>>> from deepchem.models.torch_models.layers import Fiber, SE3SelfInteraction
>>> # Create a molecular graph from SMILES
>>> mol = Chem.MolFromSmiles("CCO")
>>> # Use EquivariantGraphFeaturizer to extract SE(3)-equivariant features
>>> featurizer = dc.feat.EquivariantGraphFeaturizer(fully_connected=False, embeded=True)
>>> features = featurizer.featurize([mol])[0]
>>> G = features.to_dgl_graph()  # Convert features into a DGL graph
>>> # Define Fiber Representations for Input & Output
>>> f_in = Fiber(dictionary={0: 16, 1: 32})  # Input fiber: scalars & vectors
>>> f_out = Fiber(dictionary={0: 32, 1: 64})  # Output fiber
>>> # Initialize SE(3)-Equivariant 1x1 Convolution Layer
>>> g1x1 = SE3SelfInteraction(f_in, f_out)
>>> # Convert Node Features into the Correct Format
>>> h = {str(d): torch.randn(G.num_nodes(), f_in.structure_dict[d], 2 * d + 1) for d in f_in.structure_dict}
>>> # Apply `SE3SelfInteraction` Layer (SE(3)-Equivariant Linear Transformation)
>>> output = g1x1(h)
>>> print(output['0'].shape)
torch.Size([3, 32, 1])
>>> print(output['1'].shape)
torch.Size([3, 64, 3])

References

__init__(f_in: Fiber, f_out: Fiber, learnable: bool = True) → None[source]¶: Initializes the SE3SelfInteraction layer. :param f_in: The input Fiber structure that defines the feature multiplicities and degrees. :type f_in: Fiber :param f_out: The output Fiber structure that defines the transformed feature dimensions. :type f_out: Fiber :param learnable: If True, the transformation matrices are trainable parameters. :type learnable: bool, optional

forward(features: Dict[str, Tensor], **kwargs: Any) → Dict[str, Tensor][source]¶

Forward pass of the SE(3)-equivariant 1x1 convolution. This function applies a linear transformation *separately to each feature type (degree) in the SE(3)-equivariant representation. Mathematically, this operation is performed as:

h_out[d] = W_d * h_in[d]

where: - h_in[d] is the input feature tensor of type d. - W_d is the learned transformation matrix for type d. - h_out[d] is the transformed feature tensor of type d. :param features: Dictionary containing input node features for each feature type. :type features: Dict[str, torch.Tensor]

Returns:: Dictionary containing transformed node features for each feature type.
Return type:: Dict[str, torch.Tensor]

class SE3GraphConv(f_in: Fiber, f_out: Fiber, self_interaction: bool = False, edge_dim: int = 0, flavor: str = 'skip')[source]¶

A graph convolutional layer that is equivariant under SE(3) transformations.

This layer performs message passing between nodes while ensuring that the learned representations remain consistent under translations and rotations. The convolution is defined using a basis derived from spherical harmonics, which naturally respects SE(3) symmetry.

The layer updates each node’s features by aggregating information from its neighbors while applying transformation matrices that depend on the relative positions between nodes. This is similar to standard graph convolution but adapted to work with SE(3)-equivariant features.

Example

This example demonstrates how to use SE3GraphConv for SE(3)-equivariant graph convolutions with molecular graph data.

>>> from rdkit import Chem
>>> import dgl
>>> import torch
>>> import deepchem as dc
>>> from deepchem.models.torch_models.layers import SE3GraphConv, Fiber
>>> from deepchem.utils.equivariance_utils import get_equivariant_basis_and_r

>>> # Create a molecular graph from SMILES
>>> mol = Chem.MolFromSmiles('CCO')

>>> # Use EquivariantGraphFeaturizer to extract SE(3)-equivariant features
>>> featurizer = dc.feat.EquivariantGraphFeaturizer(fully_connected=False, embeded=True)
>>> features = featurizer.featurize([mol])[0]
>>> G = features.to_dgl_graph()  # Convert features into a DGL graph
>>> G.edata['w'] = torch.tensor(features.edge_weights, dtype=torch.float32)  # Edge weights

>>> # Define SE(3) Basis & Radial Functions
>>> basis, r = get_equivariant_basis_and_r(G, max_degree=2)

>>> # Add edge features (concatenating radial & weight features)
>>> if "w" in G.edata.keys():
...     w = G.edata["w"]
...     feat = torch.cat([w, r], -1)  # Concatenate weight + distance
... else:
...     feat = torch.cat([r], -1)  # Use only radial distances

>>> # Define Fiber Representations for Input & Output
>>> f_in = Fiber(dictionary={0: 16, 1: 32})  # Input fiber: scalars & vectors
>>> f_out = Fiber(dictionary={0: 32, 1: 64})  # Output fiber

>>> # Initialize SE(3)-Equivariant Graph Convolution Layer
>>> edge_dim = 4
>>> gconv = SE3GraphConv(f_in, f_out, self_interaction=True, edge_dim=edge_dim)
>>> h = {str(d): torch.randn(G.num_nodes(), f_in.structure_dict[d], 2*d + 1) for d in f_in.structure_dict}
>>> # Apply `SE3GraphConv` Layer (SE(3)-Equivariant Convolution)
>>> output = gconv(h, G=G, r=r, basis=basis)
>>> print(output['0'].shape)
torch.Size([3, 32, 1])
>>> print(output['1'].shape)
torch.Size([3, 64, 3])

References

__init__(f_in: Fiber, f_out: Fiber, self_interaction: bool = False, edge_dim: int = 0, flavor: str = 'skip') → None[source]¶

Parameters:

f_in (List[Tuple[int, int]]) – The input features, defined as [(multiplicities, type), …].
f_out (List[Tuple[int, int]]) – The output features, defined as [(multiplicities, type), …].
self_interaction (bool, optional) – If True, includes a learnable self-interaction term that allows the node to retain part of its original information.
edge_dim (int, optional) – Number of edge feature dimensions, affecting how edges influence the transformation.
flavor (str, optional) – Specifies the variant of convolution: - ‘TFN’ for Tensor Field Network-style convolution. - ‘skip’ to include a residual connection.

udf_u_mul_e(d_out: int) → Callable[source]¶

Computes the convolution for a single output feature type.

This function is designed as a *User Defined Function (UDF) in DGL for performing message passing.

Parameters:: d_out (int) – The degree/type of the output feature.
Returns:: A function that computes the message-passing operation.
Return type:: Callable

forward(h: Dict[str, Tensor], G, r: Tensor, basis: Dict[str, Tensor], **kwargs: Any) → Dict[str, Tensor][source]¶

Forward pass of the SE(3)-equivariant graph convolution.

This function updates node features by aggregating messages from neighboring nodes using equivariant convolutions.

The convolution operation consists of: 1. Computing pairwise convolution kernels based on spherical harmonics. 2. Using these kernels to compute messages between connected nodes. 3. Applying an optional self-interaction term. 4. Aggregating incoming messages using mean pooling.

Parameters:

h (Dict[str, torch.Tensor]) – Dictionary of node feature tensors.
G (dgl.DGLGraph) – Input graph representation.
r (torch.Tensor) – Pairwise distance tensor between nodes.
basis (Dict[str, torch.Tensor]) – Dictionary containing the SE(3)-equivariant convolution basis.

Returns:

Updated node features after convolution.

Return type:

Dict[str, torch.Tensor]

class SE3GraphNorm(fiber: Fiber, nonlin=ReLU(inplace=True), num_layers: int = 0)[source]¶

SE(3)-Equivariant Graph Normalization Layer.

This layer applies graph-based feature normalization while preserving SE(3) equivariance. It normalizes features per feature type (degree) while maintaining phase information, allowing nonlinear transformations without breaking symmetry constraints.

Given a feature tensor h[d] of degree d, we decompose it into:

h[d] = norm(h[d]) * phase(h[d])

norm(h[d]): L2 norm of the feature tensor.
phase(h[d]): The unit-normed feature vector preserving orientation.
f(norm): A learnable function** (MLP) that transforms the norms.

The transformation is computed as:

h_out[d] = f(norm(h[d])) * phase(h[d])

This ensures that features remain equivariant under SE(3) transformations.
The function f(norm) is learnable, allowing the model to adaptively scale different feature norms, providing a powerful form of data-dependent nonlinearity.

Example

>>> import torch
>>> import dgl
>>> from rdkit import Chem
>>> import deepchem as dc
>>> from deepchem.models.torch_models.layers import Fiber, SE3GraphNorm
>>> # Create a molecular graph from SMILES
>>> mol = Chem.MolFromSmiles("CCO")
>>> # Use EquivariantGraphFeaturizer to extract SE(3)-equivariant features
>>> featurizer = dc.feat.EquivariantGraphFeaturizer(fully_connected=False, embeded=True)
>>> features = featurizer.featurize([mol])[0]
>>> G = features.to_dgl_graph()  # Convert features into a DGL graph
>>> # Define Fiber Representations for Input Features
>>> f_in = Fiber(dictionary={0: 16, 1: 32})  # Scalars (degree 0) and vectors (degree 1)
>>>
>>> # Initialize SE(3)-Equivariant Graph Normalization Layer
>>> gnorm = SE3GraphNorm(f_in)
>>> h = {str(d): torch.randn(G.num_nodes(), f_in.structure_dict[d], 2 * d + 1) for d in f_in.structure_dict}
>>> # `SE3GraphNorm` Layer (SE(3)-Equivariant Normalization
>>> output = gnorm(h)
>>> for key, tensor in output.items():
...     print(tensor.shape)
torch.Size([3, 16, 1])
torch.Size([3, 32, 3])

References

__init__(fiber: Fiber, nonlin=ReLU(inplace=True), num_layers: int = 0) → None[source]¶

Initialize the SE(3)-equivariant normalization layer.

Parameters:

fiber (Fiber) – Fiber structure defining feature types and multiplicities.
nonlin (nn.Module) – Nonlinearity applied to transformed norms (default: ReLU).
num_layers (int) – Number of linear transformation layers in f(norm).

forward(features: Dict[str, Tensor], **kwargs)[source]¶

Forward pass of SE(3)-equivariant normalization.

Parameters:

(Dict[str (features) – Dictionary of input node features, where keys represent degrees.
torch.Tensor]) – Dictionary of input node features, where keys represent degrees.

Returns:

Dictionary of normalized and transformed node features.

Return type:

Dict[str, torch.Tensor]

class SE3PartialEdgeConv(f_in: Fiber, f_out: Fiber, edge_dim: int = 0, x_ij=None)[source]¶

Graph SE(3)-equivariant node-to-edge partial convolution layer.

This layer applies a partial SE(3)-equivariant convolution, mapping node features to edge features while preserving SE(3) symmetry.

The operation: - Computes a transformation of node features into edge features. - Unlike SE3GraphConv, this does not sum over input channels, making

it useful for computing value embeddings in attention mechanisms.

Optionally integrates relative position embeddings between nodes.

The partial convolution operation follows:

e_ij = Σ W_d ( h_i ⊗ h_j )

where: - h_i, h_j are node features of degrees (d_in). - W_d is a trainable convolution kernel per degree. - ⊗ represents feature-channel-wise multiplication. - e_ij is the output edge feature of degree (d_out).

This unfolded structure makes it suitable for computing value embeddings in attention-based SE(3)-equivariant models.

The layer supports injecting relative positions between connected nodes in the graph, with two modes: - Concatenation (x_ij=’cat’): Appends relative positions as a new feature. - Addition (x_ij=’add’): Directly modifies the existing vector feature.

Example

>>> import torch
>>> import dgl
>>> import deepchem as dc
>>> from rdkit import Chem
>>> from deepchem.models.torch_models.layers import Fiber, SE3PartialEdgeConv
>>> from deepchem.utils.equivariance_utils import get_equivariant_basis_and_r

>>> # Create a molecular graph from SMILES
>>> mol = Chem.MolFromSmiles("CCO")

>>> # Extract SE(3)-equivariant features
>>> featurizer = dc.feat.EquivariantGraphFeaturizer(fully_connected=False, embeded=True)
>>> features = featurizer.featurize([mol])[0]

>>> # Convert extracted features into a DGL graph
>>> G = features.to_dgl_graph()  # Convert features into a DGL graph
>>> G.edata['w'] = torch.tensor(features.edge_weights, dtype=torch.float32)  # Edge weights

>>> # Define SE(3) Basis & Radial Functions
>>> basis, r = get_equivariant_basis_and_r(G, max_degree=3)

>>> # Define Fiber Representations for Input & Output
>>> f_in = Fiber(dictionary={0: 16, 1: 32})  # Scalars (degree 0) & Vectors (degree 1)
>>> f_out = Fiber(dictionary={0: 32, 1: 64})  # Output transformations

>>> # Initialize `SE3PartialEdgeConv`
>>> edge_dim = 4
>>> gconv_partial = SE3PartialEdgeConv(f_in=f_in, f_out=f_out, edge_dim=edge_dim, x_ij=None)

>>> # Convert Node Features into the Correct Format
>>> h = {str(d): torch.randn(G.num_nodes(), f_in.structure_dict[d], 2 * d + 1) for d in f_in.structure_dict}

>>> # Apply `SE3PartialEdgeConv` Layer
>>> output = gconv_partial(h, G=G, r=r, basis=basis)

>>> for key, tensor in output.items():
...    print(tensor.shape)
torch.Size([4, 32, 1])
torch.Size([4, 64, 3])

References

__init__(f_in: Fiber, f_out: Fiber, edge_dim: int = 0, x_ij=None) → None[source]¶

Parameters:

(Fiber) (f_out) – Input fiber structure (multiplicities and types).
(Fiber) – Output fiber structure.
(int (edge_dim) – Dimensionality of edge features. Default: 0.
optional) – Dimensionality of edge features. Default: 0.
(str (x_ij) – Method to inject relative positions (‘cat’, ‘add’, None).
optional) – Method to inject relative positions (‘cat’, ‘add’, None).

udf_u_mul_e(d_out: int) → Callable[source]¶

Compute the partial convolution for a single output feature type.

This function is registered as a DGL user-defined function (UDF) for message passing between nodes and edges.

Parameters:: d_out (int) – Output feature degree.
Returns:: The DGL UDF that performs the computation.
Return type:: Function handle

forward(h: Dict[str, Tensor], G, r: Tensor, basis: Dict[str, Tensor], **kwargs: Any) → Dict[str, Tensor][source]¶

Forward pass of the SE(3)-equivariant partial convolution layer.

Parameters:

(Dict[str (basis) – Node features per degree.
torch.Tensor]) – Node features per degree.
(dgl.DGLGraph) (G) – Input graph representation.
(torch.Tensor) (r) – Pairwise inter-atomic distances.
(Dict[str – Precomputed SE(3) basis functions.
torch.Tensor]) – Precomputed SE(3) basis functions.

Returns:

Transformed edge features per degree.

Return type:

Dict[str, torch.Tensor]

class SE3ResidualAttention(f_in: Fiber, f_out: Fiber, edge_dim: int = 0, div: float = 4, n_heads: int = 1, learnable_skip=True, skip='cat', selfint='1x1', x_ij=None)[source]¶

SE(3)-Equivariant Residual Attention Block for Graph Neural Networks.

This layer applies self-attention over SE(3)-equivariant features while preserving rotation and translation equivariance. SE3ResidualAttention integrates:

SE(3)-equivariant projections** (SE3GraphConvPartial, G1x1SE3)
Multi-head attention** (GMABSE3)
Skip connections** for better gradient flow (GSum, GCat)

Mathematical Formulation: Given: - h as input node features - G as the input graph - W_Q, W_K, W_V as projection matrices for queries, keys, and values

The attention mechanism follows:

q = W_Q * h # Query k = W_K * h # Key v = W_V * h # Value

alpha = softmax(q ⋅ kᵀ / sqrt(d)) # Attention weights z = alpha ⋅ v # Aggregated output

The final transformation is applied via skip connections:

output = z + h (if ‘sum’ skip connection is used)

Relevance to SE(3)-Transformers - Maintains SE(3)-equivariance** through equivariant message passing. - Incorporates attention** to adaptively weigh interactions between features. - Uses multi-head attention** to capture multiple geometric relationships.

Example

>>> import torch
>>> import dgl
>>> import deepchem as dc
>>> from rdkit import Chem
>>> from deepchem.models.torch_models.layers import Fiber, SE3ResidualAttention
>>> from deepchem.utils.equivariance_utils import get_equivariant_basis_and_r
>>> # Create a molecular graph from SMILES
>>> mol = Chem.MolFromSmiles("CCO")
>>> featurizer = dc.feat.EquivariantGraphFeaturizer(fully_connected=False, embeded=True)
>>> features = featurizer.featurize([mol])[0]
>>>
>>> # Convert extracted features into a DGL graph
>>> G = features.to_dgl_graph()  # Convert features into a DGL graph
>>> G.edata["w"] = torch.tensor(features.edge_weights, dtype=torch.float32)
>>>
>>> # Define Fiber Representations
>>> f_in = Fiber(dictionary={0: 16, 1: 32})
>>> f_out = Fiber(dictionary={0: 32, 1: 64})
>>>
>>> # Define SE(3) Basis & Radial Functions
>>> basis, r = get_equivariant_basis_and_r(G, max_degree=2)
>>>
>>> # Initialize SE(3)-Equivariant Residual Attention Layer
>>> gse3_res = SE3ResidualAttention(f_in, f_out, edge_dim=G.edata["w"].shape[-1], div=4, n_heads=4, skip="sum")
>>>
>>> # Convert Node Features into the Correct Format
>>> h = {str(d): torch.randn(G.num_nodes(), f_in.structure_dict[d], 2 * d + 1) for d in f_in.structure_dict}
>>>
>>> # Apply `SE3ResidualAttention` Layer (SE(3)-Equivariant Residual Attention)
>>> output = gse3_res(h, G, r=r, basis=basis)
>>> for key, tensor in output.items():
...     print(tensor.shape)
torch.Size([3, 32, 1])
torch.Size([3, 64, 3])

References

__init__(f_in: Fiber, f_out: Fiber, edge_dim: int = 0, div: float = 4, n_heads: int = 1, learnable_skip=True, skip='cat', selfint='1x1', x_ij=None) → None[source]¶

Parameters:

f_in (Fiber) – Fiber defining the input feature types (multiplicities and degrees).
f_out (Fiber) – Fiber defining the output feature types.
edge_dim (int, optional) – Number of edge feature dimensions (default: 0).
div (float, optional) – Division factor for hidden dimension scaling (default: 4).
n_heads (int, optional) – Number of attention heads (default: 1).
learnable_skip (bool, optional) – Whether to make the skip connection learnable (default: True).
skip (str, optional) – Type of skip connection (‘sum’, ‘cat’, or None).
selfint (str, optional) – Type of self-interaction (‘1x1’ or ‘att’).
x_ij (str, optional) – How to handle relative positions (None, ‘cat’, or ‘add’).

forward(features: Dict[str, Tensor], G, **kwargs) → Dict[str, Tensor][source]¶

Forward pass of the SE(3)-equivariant attention block.

Parameters:

features (Dict[str, torch.Tensor]) – Dictionary of input node features, where keys represent degrees.
G (dgl.DGLGraph) – Input graph representation.

Returns:

Transformed edge features per degree.

Return type:

Dict[str, torch.Tensor]

Readout Layers¶

class GroverReadout(rtype: str = 'mean', in_features: int = 128, attn_hidden_size: int = 32, attn_out_size: int = 32)[source]¶

Performs readout on a batch of graph

The readout module is used for performing readouts on batched graphs to convert node embeddings/edge embeddings into graph embeddings. It is used in the Grover architecture to generate a graph embedding from node and edge embeddings. The generate embedding can be used in downstream tasks like graph classification or graph prediction problems.

Parameters:

rtype (str) – Readout type, can be ‘mean’ or ‘self-attention’
in_features (int) – Size fof input features
attn_hidden_size (int) – If readout type is attention, size of hidden layer in attention network.
attn_out_size (int) – If readout type is attention, size of attention out layer.

Example

>>> import torch
>>> from deepchem.models.torch_models.readout import GroverReadout
>>> n_nodes, n_features = 6, 32
>>> readout = GroverReadout(rtype="mean")
>>> embedding = torch.ones(n_nodes, n_features)
>>> result = readout(embedding, scope=[(0, 6)])
>>> result.size()
torch.Size([1, 32])

__init__(rtype: str = 'mean', in_features: int = 128, attn_hidden_size: int = 32, attn_out_size: int = 32)[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(graph_embeddings: Tensor, scope: List[List]) → Tensor[source]¶

Given a batch node/edge embedding and a scope list, produce the graph-level embedding by scope.

Parameters:

embeddings (torch.Tensor) – The embedding matrix, num_nodes x in_features or num_edges x in_features.
scope (List[List]) – A list, in which the element is a list [start, range]. start is the index, range is the length of scope. (start + range = end)

Returns:

graph_embeddings – A stacked tensor containing graph embeddings of shape len(scope) x in_features if readout type is mean or len(scope) x attn_out_size when readout type is self-attention.

Return type:

torch.Tensor

Jax Layers¶

class Linear(num_output: int, initializer: str = 'linear', use_bias: bool = True, bias_init: float = 0.0, name: str = 'linear')[source]¶

Protein folding specific Linear Module.

This differs from the standard Haiku Linear in a few ways:

It supports inputs of arbitrary rank
Initializers are specified by strings

This code is adapted from DeepMind’s AlphaFold code release (https://github.com/deepmind/alphafold).

Examples

>>> import deepchem as dc
>>> import haiku as hk
>>> import jax
>>> import deepchem.models.jax_models.layers
>>> def forward_model(x):
...   layer = dc.models.jax_models.layers.Linear(2)
...   return layer(x)
>>> f = hk.transform(forward_model)
>>> rng = jax.random.PRNGKey(42)
>>> x = jnp.ones([8, 28 * 28])
>>> params = f.init(rng, x)
>>> output = f.apply(params, rng, x)

__init__(num_output: int, initializer: str = 'linear', use_bias: bool = True, bias_init: float = 0.0, name: str = 'linear')[source]¶

Constructs Linear Module.

Parameters:

num_output (int) – number of output channels.
initializer (str (default 'linear')) – What initializer to use, should be one of {‘linear’, ‘relu’, ‘zeros’}
use_bias (bool (default True)) – Whether to include trainable bias
bias_init (float (default 0)) – Value used to initialize bias.
name (str (default 'linear')) – name of module, used for name scopes.

Density Functional Theory Layers¶

class DFTXC(xcstr: str, nnmodel: Module, aweight0: float = 0.0)[source]¶

This layer initializes the neural network exchange correlation functional and the hybrid functional. It is then used to run the Kohn Sham iterations.

Examples

>>> import torch
>>> from deepchem.feat.dft_data import DFTEntry
>>> from deepchem.models.dft.dftxc import DFTXC
>>> e_type = 'ie'
>>> true_val= '0.53411947056'
>>> systems = [{'moldesc': 'N 0 0 0',
...       'basis': '6-311++G(3df,3pd)',
...        'spin': '3'},
...       {'moldesc': 'N 0 0 0',
...       'basis': '6-311++G(3df,3pd)',
...       'charge': 1,
...        'spin': '2'}]
>>> entry = DFTEntry.create(e_type, true_val, systems)
>>> nnmodel = _construct_nn_model(input_size=2, hidden_size=10, n_layers=1,modeltype=1).to(torch.double)
>>> model = DFTXC("lda_x", nnmodel)
>>> output = model([entry])
The 6-311++G(3df,3pd) basis for atomz 7 does not exist, but we will download it
Downloaded to /home/runner/miniconda3/envs/deepchem/lib/python3.8/site-packages/dqc/api/.database/6-311ppg_3df_3pd_/07.gaussian94

__init__(xcstr: str, nnmodel: Module, aweight0: float = 0.0)[source]¶

Parameters:

xcstr (str) – The choice of xc to use. Some of the commonly used ones are: lda_x, lda_c_pw, lda_c_ow, lda_c_pz, lda_xc_lp_a, lda_xc_lp_b.
nnmodel (torch.nn.Module) – the PyTorch model implementing the calculation
aweight0 (float (default 0.0)) – The weightage of the Neural Network Model in the final result.

Notes

It is not necessary to use the default method(_construct_nn_model) with the XCModel.

forward(inputs)[source]¶

Parameters:: inputs (list) – list of entry objects that have been defined using DFTEntry
Returns:: output – Calculated value of the data point after running the Kohn Sham iterations using the neural network XC functional.
Return type:: list of torch.Tensor

InceptionV3 Layers¶

MobileNetV2 Layers¶

class InvertedResidual(inp: int, oup: int, stride: int, expand_ratio: float)[source]¶

Inverted Residual block used in MobileNetV2 architecture.

This block uses a combination of pointwise, depthwise, and another pointwise convolution with optional residual connections based on input/output channels and stride.

Parameters:

inp (int) – Number of input channels.
oup (int) – Number of output channels.
stride (int) – Stride for depthwise convolution. Must be 1 or 2.
expand_ratio (float) – Expansion ratio for the hidden dimension. If 1, the input is not expanded.

Returns:

use_res_connect (bool) – Whether to use the residual connection.
conv (nn.Sequential) – The core convolutional operations in the block.

Examples

>>> import torch
>>> x = torch.randn(1, 16, 32, 32)
>>> block = InvertedResidual(inp=16, oup=16, stride=1, expand_ratio=1)
>>> out = block(x)
>>> out.shape
torch.Size([1, 16, 32, 32])

__init__(inp: int, oup: int, stride: int, expand_ratio: float) → None[source]¶: Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor) → Tensor[source]¶

Forward pass of the InvertedResidual block.

Parameters:: x (Tensor) – Input tensor of shape (N, C, H, W)
Returns:: Output tensor of shape (N, oup, H_out, W_out)
Return type:: Tensor

ChemCeption Layers¶

class Stem(in_channels: int, out_channels: int)[source]¶

Implements the Stem Layer as defined in https://arxiv.org/abs/1710.02238.

This layer serves as the initial processing block in ChemCeption, downsampling input images to reduce computational complexity before they pass through deeper network layers. The convolutional layer with stride 2 helps in feature extraction while reducing spatial dimensions.

Examples

>>> import numpy as np
>>> import torch
>>> from deepchem.models.torch_models.chemnet_layers import Stem
>>> in_channels = 3
>>> out_channels = 4
>>> input_tensor = np.random.rand(1, in_channels, 32, 32).astype(np.float32)  # (Batch, Channels, Height, Width)
>>> input_tensor_torch = torch.from_numpy(input_tensor)
>>> layer = Stem(in_channels, out_channels)
>>> output_tensor = layer(input_tensor_torch)
>>> output_tensor.shape
torch.Size([1, 4, 15, 15])

__init__(in_channels: int, out_channels: int) → None[source]¶

Initializes the Stem layer.

Parameters:

in_channels (int) – The number of channels in the input tensor.
out_channels (int) – The number of filters applied in the convolution operation.

forward(inputs: Tensor) → Tensor[source]¶

Forward pass of the Stem layer.

Parameters:: inputs (torch.Tensor) – Input tensor of shape (batch_size, in_channels, H, W).
Returns:: Output tensor of shape (batch_size, out_channels, H_out, W_out), where H_out and W_out are reduced due to downsampling. The output is a feature map with extracted spatial representations.
Return type:: torch.Tensor

class InceptionResnetA(in_channels: int, out_channels: int)[source]¶

Implements the Inception-ResNet-A block from the Inception-ResNet architecture as described in https://arxiv.org/abs/1710.0223.

This block combines multiple convolutional branches with varying receptive fields, concatenates their outputs, projects them back to the input dimensions using a 1x1 convolution, and adds the result to the original input (residual connection). A ReLU activation is applied at the end.

Examples

>>> import numpy as np
>>> import torch
>>> from deepchem.models.torch_models.chemnet_layers import InceptionResnetA
>>> in_channels = 64
>>> out_channels = 32
>>> input_tensor = np.random.rand(1, in_channels, 28, 28).astype(np.float32) # (Batch, Channels, Height, Width)
>>> input_tensor_torch = torch.from_numpy(input_tensor)
>>> layer = InceptionResnetA(in_channels, out_channels)
>>> output_tensor = layer(input_tensor_torch)
>>> output_tensor.shape
torch.Size([1, 64, 28, 28])

__init__(in_channels: int, out_channels: int) → None[source]¶

Initializes the Inception-ResNet-A block.

Parameters:

in_channels (int) – Number of input channels.
out_channels (int) – Number of filters in the convolutional branches.

forward(x: Tensor) → Tensor[source]¶

Forward pass of the Inception-ResNet-A block.

Parameters:: x (torch.Tensor) – Input tensor of shape (batch_size, in_channels, H, W).
Returns:: Output tensor of the same shape as input (batch_size, in_channels, H, W), after applying the Inception-ResNet-A transformations and residual connection.
Return type:: torch.Tensor

class InceptionResnetB(in_channels: int, out_channels: int)[source]¶

Implements the Inception-ResNet-B block from the Inception-ResNet architecture as described in https://arxiv.org/abs/1710.0223.

This block consists of two parallel branches: - A simple 1x1 convolution. - A deeper sequence with asymmetric convolutions (1x7 followed by 7x1) for

efficient large receptive field learning.

Outputs from both branches are concatenated and passed through a 1x1 convolution to project back to the original input dimension, and added to the input (residual connection). A ReLU activation follows.

Examples

>>> import numpy as np
>>> import torch
>>> from deepchem.models.torch_models.chemnet_layers import InceptionResnetB
>>> in_channels = 64
>>> out_channels = 32
>>> input_tensor = np.random.rand(1, in_channels, 28, 28).astype(np.float32)
>>> input_tensor_torch = torch.from_numpy(input_tensor)
>>> layer = InceptionResnetB(in_channels, out_channels)
>>> output_tensor = layer(input_tensor_torch)
>>> output_tensor.shape
torch.Size([1, 64, 28, 28])

__init__(in_channels: int, out_channels: int) → None[source]¶

Initializes the Inception-ResNet-B block.

Parameters:

in_channels (int) – Number of input channels.
out_channels (int) – Number of filters used in the convolutional branches.

forward(x: Tensor) → Tensor[source]¶

Forward pass of the Inception-ResNet-B block.

Parameters:: x (torch.Tensor) – Input tensor of shape (batch_size, in_channels, H, W)
Returns:: Output tensor of shape (batch_size, in_channels, H, W)
Return type:: torch.Tensor

class InceptionResnetB(in_channels: int, out_channels: int)[source]¶

Implements the Inception-ResNet-B block from the Inception-ResNet architecture as described in https://arxiv.org/abs/1710.0223.

This block consists of two parallel branches: - A simple 1x1 convolution. - A deeper sequence with asymmetric convolutions (1x7 followed by 7x1) for

efficient large receptive field learning.

Outputs from both branches are concatenated and passed through a 1x1 convolution to project back to the original input dimension, and added to the input (residual connection). A ReLU activation follows.

Examples

>>> import numpy as np
>>> import torch
>>> from deepchem.models.torch_models.chemnet_layers import InceptionResnetB
>>> in_channels = 64
>>> out_channels = 32
>>> input_tensor = np.random.rand(1, in_channels, 28, 28).astype(np.float32)
>>> input_tensor_torch = torch.from_numpy(input_tensor)
>>> layer = InceptionResnetB(in_channels, out_channels)
>>> output_tensor = layer(input_tensor_torch)
>>> output_tensor.shape
torch.Size([1, 64, 28, 28])

__init__(in_channels: int, out_channels: int) → None[source]¶

Initializes the Inception-ResNet-B block.

Parameters:

in_channels (int) – Number of input channels.
out_channels (int) – Number of filters used in the convolutional branches.

forward(x: Tensor) → Tensor[source]¶

Forward pass of the Inception-ResNet-B block.

Parameters:: x (torch.Tensor) – Input tensor of shape (batch_size, in_channels, H, W)
Returns:: Output tensor of shape (batch_size, in_channels, H, W)
Return type:: torch.Tensor

class ReductionA(in_channels: int, out_channels: int)[source]¶

Implements the Reduction-A block from the Inception-ResNet architecture as described in https://arxiv.org/pdf/1706.06689.

This block reduces the spatial dimensions of the input tensor while increasing the number of feature channels. It consists of three parallel branches: - A max pooling layer. - A single 3x3 convolution with stride 2. - A sequence of 1x1 → 3x3 → 3x3 convolutions, ending with stride 2.

The outputs from all branches are concatenated along the channel dimension.

Examples

>>> import numpy as np
>>> import torch
>>> from deepchem.models.torch_models.chemnet_layers import ReductionA
>>> in_channels = 3
>>> out_channels = 32
>>> input_tensor = np.random.rand(1, in_channels, 4, 4).astype(np.float32)
>>> input_tensor_torch = torch.from_numpy(input_tensor)
>>> layer = ReductionA(in_channels, out_channels)
>>> output_tensor = layer(input_tensor_torch)
>>> output_tensor.shape
torch.Size([1, 99, 1, 1])

__init__(in_channels: int, out_channels: int) → None[source]¶

Initializes the Reduction-A block.

Parameters:

in_channels (int) – Number of input channels.
out_channels (int) – Base number of output filters used in the convolutional branches.

forward(x: Tensor) → Tensor[source]¶

Forward pass of the Reduction-A block.

Parameters:: x (torch.Tensor) – Input tensor of shape (batch_size, in_channels, H, W)
Returns:: Output tensor with reduced spatial dimensions and increased channels.
Return type:: torch.Tensor

class ReductionB(in_channels: int, out_channels: int)[source]¶

Implements the Reduction-B block from the Inception-ResNet architecture as described in https://arxiv.org/pdf/1706.06689.

This block aggressively reduces the spatial dimensions while using multiple convolutional branches to preserve rich feature representations.

The four branches are: - Max pooling with stride 2. - 1x1 → 3x3 convolution (stride 2). - Another 1x1 → 3x3 convolution (stride 2) with different filter scaling. - A deeper asymmetric 1x1 → 3x1 → 3x3 convolution path (stride 2).

The outputs are concatenated along the channel dimension and passed through a final ReLU.

Examples

>>> import numpy as np
>>> import torch
>>> from deepchem.models.torch_models.chemnet_layers import ReductionB
>>> in_channels = 3
>>> out_channels = 32
>>> input_tensor = np.random.rand(1, in_channels, 4, 4).astype(np.float32)
>>> input_tensor_torch = torch.from_numpy(input_tensor)
>>> layer = ReductionB(in_channels, out_channels)
>>> output_tensor = layer(input_tensor_torch)
>>> output_tensor.shape
torch.Size([1, 127, 1, 1])

__init__(in_channels: int, out_channels: int) → None[source]¶

Initializes the Reduction-B block.

Parameters:

in_channels (int) – Number of input channels.
out_channels (int) – Base number of output filters for each convolutional branch.

forward(x: Tensor) → Tensor[source]¶

Forward pass of the Reduction-B block.

Parameters:: x (torch.Tensor) – Input tensor of shape (batch_size, in_channels, H, W)
Returns:: Output tensor with reduced spatial dimensions and increased channels.
Return type:: torch.Tensor

Layers¶

Layers Cheatsheet¶

Keras Layers¶

Torch Layers¶

Returns:¶

Returns:¶

Flow Layers¶

Grover Layers¶

Reference¶

Reference¶

Reference¶

Reference¶

Reference¶

Attention Layers¶

Readout Layers¶

Jax Layers¶

Density Functional Theory Layers¶

InceptionV3 Layers¶

MobileNetV2 Layers¶

ChemCeption Layers¶