Data Classes

DeepChem featurizers often transform members into “data classes”. These are classes that hold all the information needed to train a model on that data point. Models then transform these into the tensors for training in their default_generator methods.

Graph Convolutions

These classes document the data classes for graph convolutions. We plan to simplify these classes into a joint data representation for all graph convolutions in a future version of DeepChem, so these APIs may not remain stable.

class deepchem.feat.mol_graphs.ConvMol(atom_features, adj_list, max_deg=10, min_deg=0)[source]

Holds information about a molecules.

Resorts order of atoms internally to be in order of increasing degree. Note that only heavy atoms (hydrogens excluded) are considered here.

__init__(atom_features, adj_list, max_deg=10, min_deg=0)[source]
Parameters:
  • atom_features (np.ndarray) – Has shape (n_atoms, n_feat)
  • adj_list (list) – List of length n_atoms, with neighor indices of each atom.
  • max_deg (int, optional) – Maximum degree of any atom.
  • min_deg (int, optional) – Minimum degree of any atom.
static agglomerate_mols(mols, max_deg=10, min_deg=0)[source]

Concatenates list of ConvMol’s into one mol object that can be used to feed into tensorflow placeholders. The indexing of the molecules are preseved during the combination, but the indexing of the atoms are greatly changed.

Parameters:mols (list) – ConvMol objects to be combined into one molecule.
get_adjacency_list()[source]

Returns a canonicalized adjacency list.

Canonicalized means that the atoms are re-ordered by degree.

Returns:Canonicalized form of adjacency list.
Return type:list
get_atom_features()[source]

Returns canonicalized version of atom features.

Features are sorted by atom degree, with original order maintained when degrees are same.

get_atoms_with_deg(deg)[source]

Retrieves atom_features with the specific degree

get_deg_adjacency_lists()[source]

Returns adjacency lists grouped by atom degree.

Returns:Has length (max_deg+1-min_deg). The element at position deg is itself a list of the neighbor-lists for atoms with degree deg.
Return type:list
get_deg_slice()[source]

Returns degree-slice tensor.

The deg_slice tensor allows indexing into a flattened version of the molecule’s atoms. Assume atoms are sorted in order of degree. Then deg_slice[deg][0] is the starting position for atoms of degree deg in flattened list, and deg_slice[deg][1] is the number of atoms with degree deg.

Note deg_slice has shape (max_deg+1-min_deg, 2).

Returns:deg_slice – Shape (max_deg+1-min_deg, 2)
Return type:np.ndarray
static get_null_mol(n_feat, max_deg=10, min_deg=0)[source]

Constructs a null molecules

Get one molecule with one atom of each degree, with all the atoms connected to themselves, and containing n_feat features.

Parameters:n_feat (int) – number of features for the nodes in the null molecule
get_num_atoms_with_deg(deg)[source]

Returns the number of atoms with the given degree

class deepchem.feat.mol_graphs.MultiConvMol(nodes, deg_adj_lists, deg_slice, membership, num_mols)[source]

Holds information about multiple molecules, for use in feeding information into tensorflow. Generated using the agglomerate_mols function

__init__(nodes, deg_adj_lists, deg_slice, membership, num_mols)[source]

Initialize self. See help(type(self)) for accurate signature.

class deepchem.feat.mol_graphs.WeaveMol(nodes, pairs, pair_edges)[source]

Molecular featurization object for weave convolutions.

These objects are produced by WeaveFeaturizer, and feed into WeaveModel. The underlying implementation is inspired by [1].

References

[1]Kearnes, Steven, et al. “Molecular graph convolutions: moving beyond fingerprints.” Journal of computer-aided molecular design 30.8 (2016): 595-608.
__init__(nodes, pairs, pair_edges)[source]

Initialize self. See help(type(self)) for accurate signature.

class deepchem.feat.graph_data.GraphData(node_features: numpy.ndarray, edge_index: numpy.ndarray, edge_features: Optional[numpy.ndarray] = None, node_pos_features: Optional[numpy.ndarray] = None)[source]

GraphData class

This data class is almost same as torch_geometric.data.Data.

node_features

Node feature matrix with shape [num_nodes, num_node_features]

Type:np.ndarray
edge_index

Graph connectivity in COO format with shape [2, num_edges]

Type:np.ndarray, dtype int
edge_features

Edge feature matrix with shape [num_edges, num_edge_features]

Type:np.ndarray, optional (default None)
node_pos_features

Node position matrix with shape [num_nodes, num_dimensions].

Type:np.ndarray, optional (default None)
num_nodes

The number of nodes in the graph

Type:int
num_node_features

The number of features per node in the graph

Type:int
num_edges

The number of edges in the graph

Type:int
num_edges_features

The number of features per edge in the graph

Type:int, optional (default None)

Examples

>>> import numpy as np
>>> node_features = np.random.rand(5, 10)
>>> edge_index = np.array([[0, 1, 2, 3, 4], [1, 2, 3, 4, 0]], dtype=np.int64)
>>> graph = GraphData(node_features=node_features, edge_index=edge_index)
__init__(node_features: numpy.ndarray, edge_index: numpy.ndarray, edge_features: Optional[numpy.ndarray] = None, node_pos_features: Optional[numpy.ndarray] = None)[source]
Parameters:
  • node_features (np.ndarray) – Node feature matrix with shape [num_nodes, num_node_features]
  • edge_index (np.ndarray, dtype int) – Graph connectivity in COO format with shape [2, num_edges]
  • edge_features (np.ndarray, optional (default None)) – Edge feature matrix with shape [num_edges, num_edge_features]
  • node_pos_features (np.ndarray, optional (default None)) – Node position matrix with shape [num_nodes, num_dimensions].
to_dgl_graph()[source]

Convert to DGL graph data instance

Returns:Graph data for DGL
Return type:dgl.DGLGraph

Notes

This method requires DGL to be installed.

to_pyg_graph()[source]

Convert to PyTorch Geometric graph data instance

Returns:Graph data for PyTorch Geometric
Return type:torch_geometric.data.Data

Notes

This method requires PyTorch Geometric to be installed.

class deepchem.feat.graph_data.BatchGraphData(graph_list: Sequence[deepchem.feat.graph_data.GraphData])[source]

Batch GraphData class

node_features

Concatenated node feature matrix with shape [num_nodes, num_node_features]. num_nodes is total number of nodes in the batch graph.

Type:np.ndarray
edge_index

Concatenated graph connectivity in COO format with shape [2, num_edges]. num_edges is total number of edges in the batch graph.

Type:np.ndarray, dtype int
edge_features

Concatenated edge feature matrix with shape [num_edges, num_edge_features]. num_edges is total number of edges in the batch graph.

Type:np.ndarray, optional (default None)
node_pos_features

Concatenated node position matrix with shape [num_nodes, num_dimensions]. num_nodes is total number of edges in the batch graph.

Type:np.ndarray, optional (default None)
num_nodes

The number of nodes in the batch graph.

Type:int
num_node_features

The number of features per node in the graph.

Type:int
num_edges

The number of edges in the batch graph.

Type:int
num_edges_features

The number of features per edge in the graph.

Type:int, optional (default None)
graph_index

This vector indicates which graph the node belongs with shape [num_nodes,].

Type:np.ndarray, dtype int

Examples

>>> import numpy as np
>>> from deepchem.feat.graph_data import GraphData
>>> node_features_list = np.random.rand(2, 5, 10)
>>> edge_index_list = np.array([
...    [[0, 1, 2, 3, 4], [1, 2, 3, 4, 0]],
...    [[0, 1, 2, 3, 4], [1, 2, 3, 4, 0]],
... ], dtype=np.int)
>>> graph_list = [GraphData(node_features, edge_index) for node_features, edge_index
...           in zip(node_features_list, edge_index_list)]
>>> batch_graph = BatchGraphData(graph_list=graph_list)
__init__(graph_list: Sequence[deepchem.feat.graph_data.GraphData])[source]
Parameters:graph_list (Sequence[GraphData]) – List of GraphData
to_dgl_graph()[source]

Convert to DGL graph data instance

Returns:Graph data for DGL
Return type:dgl.DGLGraph

Notes

This method requires DGL to be installed.

to_pyg_graph()[source]

Convert to PyTorch Geometric graph data instance

Returns:Graph data for PyTorch Geometric
Return type:torch_geometric.data.Data

Notes

This method requires PyTorch Geometric to be installed.