Data Classes

DeepChem featurizers often transform members into “data classes”. These are classes that hold all the information needed to train a model on that data point. Models then transform these into the tensors for training in their default_generator methods.

Graph Convolutions

These classes document the data classes for graph convolutions. We plan to simplify these classes into a joint data representation for all graph convolutions in a future version of DeepChem, so these APIs may not remain stable.

class ConvMol(atom_features, adj_list, max_deg=10, min_deg=0)[source]

Holds information about a molecules.

Resorts order of atoms internally to be in order of increasing degree. Note that only heavy atoms (hydrogens excluded) are considered here.

__init__(atom_features, adj_list, max_deg=10, min_deg=0)[source]
Parameters
  • atom_features (np.ndarray) – Has shape (n_atoms, n_feat)

  • adj_list (list) – List of length n_atoms, with neighor indices of each atom.

  • max_deg (int, optional) – Maximum degree of any atom.

  • min_deg (int, optional) – Minimum degree of any atom.

get_atoms_with_deg(deg)[source]

Retrieves atom_features with the specific degree

get_num_atoms_with_deg(deg)[source]

Returns the number of atoms with the given degree

get_atom_features()[source]

Returns canonicalized version of atom features.

Features are sorted by atom degree, with original order maintained when degrees are same.

get_adjacency_list()[source]

Returns a canonicalized adjacency list.

Canonicalized means that the atoms are re-ordered by degree.

Returns

Canonicalized form of adjacency list.

Return type

list

get_deg_adjacency_lists()[source]

Returns adjacency lists grouped by atom degree.

Returns

Has length (max_deg+1-min_deg). The element at position deg is itself a list of the neighbor-lists for atoms with degree deg.

Return type

list

get_deg_slice()[source]

Returns degree-slice tensor.

The deg_slice tensor allows indexing into a flattened version of the molecule’s atoms. Assume atoms are sorted in order of degree. Then deg_slice[deg][0] is the starting position for atoms of degree deg in flattened list, and deg_slice[deg][1] is the number of atoms with degree deg.

Note deg_slice has shape (max_deg+1-min_deg, 2).

Returns

deg_slice – Shape (max_deg+1-min_deg, 2)

Return type

np.ndarray

static get_null_mol(n_feat, max_deg=10, min_deg=0)[source]

Constructs a null molecules

Get one molecule with one atom of each degree, with all the atoms connected to themselves, and containing n_feat features.

Parameters

n_feat (int) – number of features for the nodes in the null molecule

static agglomerate_mols(mols, max_deg=10, min_deg=0)[source]

Concatenates list of ConvMol’s into one mol object that can be used to feed into tensorflow placeholders. The indexing of the molecules are preseved during the combination, but the indexing of the atoms are greatly changed.

Parameters

mols (list) – ConvMol objects to be combined into one molecule.

class MultiConvMol(nodes, deg_adj_lists, deg_slice, membership, num_mols)[source]

Holds information about multiple molecules, for use in feeding information into tensorflow. Generated using the agglomerate_mols function

__init__(nodes, deg_adj_lists, deg_slice, membership, num_mols)[source]

Initialize self. See help(type(self)) for accurate signature.

class WeaveMol(nodes, pairs, pair_edges)[source]

Molecular featurization object for weave convolutions.

These objects are produced by WeaveFeaturizer, and feed into WeaveModel. The underlying implementation is inspired by 1.

References

1

Kearnes, Steven, et al. “Molecular graph convolutions: moving beyond fingerprints.” Journal of computer-aided molecular design 30.8 (2016): 595-608.

__init__(nodes, pairs, pair_edges)[source]

Initialize self. See help(type(self)) for accurate signature.

class GraphData(node_features: numpy.ndarray, edge_index: numpy.ndarray, edge_features: Optional[numpy.ndarray] = None, node_pos_features: Optional[numpy.ndarray] = None)[source]

GraphData class

This data class is almost same as torch_geometric.data.Data.

node_features[source]

Node feature matrix with shape [num_nodes, num_node_features]

Type

np.ndarray

edge_index[source]

Graph connectivity in COO format with shape [2, num_edges]

Type

np.ndarray, dtype int

edge_features[source]

Edge feature matrix with shape [num_edges, num_edge_features]

Type

np.ndarray, optional (default None)

node_pos_features[source]

Node position matrix with shape [num_nodes, num_dimensions].

Type

np.ndarray, optional (default None)

num_nodes[source]

The number of nodes in the graph

Type

int

num_node_features[source]

The number of features per node in the graph

Type

int

num_edges[source]

The number of edges in the graph

Type

int

num_edges_features[source]

The number of features per edge in the graph

Type

int, optional (default None)

Examples

>>> import numpy as np
>>> node_features = np.random.rand(5, 10)
>>> edge_index = np.array([[0, 1, 2, 3, 4], [1, 2, 3, 4, 0]], dtype=np.int64)
>>> graph = GraphData(node_features=node_features, edge_index=edge_index)
__init__(node_features: numpy.ndarray, edge_index: numpy.ndarray, edge_features: Optional[numpy.ndarray] = None, node_pos_features: Optional[numpy.ndarray] = None)[source]
Parameters
  • node_features (np.ndarray) – Node feature matrix with shape [num_nodes, num_node_features]

  • edge_index (np.ndarray, dtype int) – Graph connectivity in COO format with shape [2, num_edges]

  • edge_features (np.ndarray, optional (default None)) – Edge feature matrix with shape [num_edges, num_edge_features]

  • node_pos_features (np.ndarray, optional (default None)) – Node position matrix with shape [num_nodes, num_dimensions].

to_pyg_graph()[source]

Convert to PyTorch Geometric graph data instance

Returns

Graph data for PyTorch Geometric

Return type

torch_geometric.data.Data

Notes

This method requires PyTorch Geometric to be installed.

to_dgl_graph(self_loop: bool = False)[source]

Convert to DGL graph data instance

Returns

  • dgl.DGLGraph – Graph data for DGL

  • self_loop (bool) – Whether to add self loops for the nodes, i.e. edges from nodes to themselves. Default to False.

Notes

This method requires DGL to be installed.

class BatchGraphData(graph_list: Sequence[deepchem.feat.graph_data.GraphData])[source]

Batch GraphData class

node_features[source]

Concatenated node feature matrix with shape [num_nodes, num_node_features]. num_nodes is total number of nodes in the batch graph.

Type

np.ndarray

edge_index[source]

Concatenated graph connectivity in COO format with shape [2, num_edges]. num_edges is total number of edges in the batch graph.

Type

np.ndarray, dtype int

edge_features[source]

Concatenated edge feature matrix with shape [num_edges, num_edge_features]. num_edges is total number of edges in the batch graph.

Type

np.ndarray, optional (default None)

node_pos_features[source]

Concatenated node position matrix with shape [num_nodes, num_dimensions]. num_nodes is total number of edges in the batch graph.

Type

np.ndarray, optional (default None)

num_nodes[source]

The number of nodes in the batch graph.

Type

int

num_node_features[source]

The number of features per node in the graph.

Type

int

num_edges[source]

The number of edges in the batch graph.

Type

int

num_edges_features[source]

The number of features per edge in the graph.

Type

int, optional (default None)

graph_index[source]

This vector indicates which graph the node belongs with shape [num_nodes,].

Type

np.ndarray, dtype int

Examples

>>> import numpy as np
>>> from deepchem.feat.graph_data import GraphData
>>> node_features_list = np.random.rand(2, 5, 10)
>>> edge_index_list = np.array([
...    [[0, 1, 2, 3, 4], [1, 2, 3, 4, 0]],
...    [[0, 1, 2, 3, 4], [1, 2, 3, 4, 0]],
... ], dtype=np.int)
>>> graph_list = [GraphData(node_features, edge_index) for node_features, edge_index
...           in zip(node_features_list, edge_index_list)]
>>> batch_graph = BatchGraphData(graph_list=graph_list)
__init__(graph_list: Sequence[deepchem.feat.graph_data.GraphData])[source]
Parameters

graph_list (Sequence[GraphData]) – List of GraphData