Data Classes¶
DeepChem featurizers often transform members into “data classes”. These are
classes that hold all the information needed to train a model on that data
point. Models then transform these into the tensors for training in their
default_generator
methods.
Graph Convolutions¶
These classes document the data classes for graph convolutions. We plan to simplify these classes into a joint data representation for all graph convolutions in a future version of DeepChem, so these APIs may not remain stable.

class
deepchem.feat.mol_graphs.
ConvMol
(atom_features, adj_list, max_deg=10, min_deg=0)[source]¶ Holds information about a molecules.
Resorts order of atoms internally to be in order of increasing degree. Note that only heavy atoms (hydrogens excluded) are considered here.

__init__
(atom_features, adj_list, max_deg=10, min_deg=0)[source]¶ Parameters:  atom_features (np.ndarray) – Has shape (n_atoms, n_feat)
 adj_list (list) – List of length n_atoms, with neighor indices of each atom.
 max_deg (int, optional) – Maximum degree of any atom.
 min_deg (int, optional) – Minimum degree of any atom.

static
agglomerate_mols
(mols, max_deg=10, min_deg=0)[source]¶ Concatenates list of ConvMol’s into one mol object that can be used to feed into tensorflow placeholders. The indexing of the molecules are preseved during the combination, but the indexing of the atoms are greatly changed.
Parameters: mols (list) – ConvMol objects to be combined into one molecule.

get_adjacency_list
()[source]¶ Returns a canonicalized adjacency list.
Canonicalized means that the atoms are reordered by degree.
Returns: Canonicalized form of adjacency list. Return type: list

get_atom_features
()[source]¶ Returns canonicalized version of atom features.
Features are sorted by atom degree, with original order maintained when degrees are same.

get_deg_adjacency_lists
()[source]¶ Returns adjacency lists grouped by atom degree.
Returns: Has length (max_deg+1min_deg). The element at position deg is itself a list of the neighborlists for atoms with degree deg. Return type: list

get_deg_slice
()[source]¶ Returns degreeslice tensor.
The deg_slice tensor allows indexing into a flattened version of the molecule’s atoms. Assume atoms are sorted in order of degree. Then deg_slice[deg][0] is the starting position for atoms of degree deg in flattened list, and deg_slice[deg][1] is the number of atoms with degree deg.
Note deg_slice has shape (max_deg+1min_deg, 2).
Returns: deg_slice – Shape (max_deg+1min_deg, 2) Return type: np.ndarray


class
deepchem.feat.mol_graphs.
MultiConvMol
(nodes, deg_adj_lists, deg_slice, membership, num_mols)[source]¶ Holds information about multiple molecules, for use in feeding information into tensorflow. Generated using the agglomerate_mols function

class
deepchem.feat.mol_graphs.
WeaveMol
(nodes, pairs, pair_edges)[source]¶ Molecular featurization object for weave convolutions.
These objects are produced by WeaveFeaturizer, and feed into WeaveModel. The underlying implementation is inspired by [1].
References
[1] Kearnes, Steven, et al. “Molecular graph convolutions: moving beyond fingerprints.” Journal of computeraided molecular design 30.8 (2016): 595608.

class
deepchem.feat.graph_data.
GraphData
(node_features: numpy.ndarray, edge_index: numpy.ndarray, edge_features: Optional[numpy.ndarray] = None, node_pos_features: Optional[numpy.ndarray] = None)[source]¶ GraphData class
This data class is almost same as torch_geometric.data.Data.

node_features
¶ Node feature matrix with shape [num_nodes, num_node_features]
Type: np.ndarray

edge_index
¶ Graph connectivity in COO format with shape [2, num_edges]
Type: np.ndarray, dtype int

edge_features
¶ Edge feature matrix with shape [num_edges, num_edge_features]
Type: np.ndarray, optional (default None)

node_pos_features
¶ Node position matrix with shape [num_nodes, num_dimensions].
Type: np.ndarray, optional (default None)

num_nodes
¶ The number of nodes in the graph
Type: int

num_node_features
¶ The number of features per node in the graph
Type: int

num_edges
¶ The number of edges in the graph
Type: int

num_edges_features
¶ The number of features per edge in the graph
Type: int, optional (default None)
Examples
>>> import numpy as np >>> node_features = np.random.rand(5, 10) >>> edge_index = np.array([[0, 1, 2, 3, 4], [1, 2, 3, 4, 0]], dtype=np.int64) >>> graph = GraphData(node_features=node_features, edge_index=edge_index)

__init__
(node_features: numpy.ndarray, edge_index: numpy.ndarray, edge_features: Optional[numpy.ndarray] = None, node_pos_features: Optional[numpy.ndarray] = None)[source]¶ Parameters:  node_features (np.ndarray) – Node feature matrix with shape [num_nodes, num_node_features]
 edge_index (np.ndarray, dtype int) – Graph connectivity in COO format with shape [2, num_edges]
 edge_features (np.ndarray, optional (default None)) – Edge feature matrix with shape [num_edges, num_edge_features]
 node_pos_features (np.ndarray, optional (default None)) – Node position matrix with shape [num_nodes, num_dimensions].


class
deepchem.feat.graph_data.
BatchGraphData
(graph_list: Sequence[deepchem.feat.graph_data.GraphData])[source]¶ Batch GraphData class

node_features
¶ Concatenated node feature matrix with shape [num_nodes, num_node_features]. num_nodes is total number of nodes in the batch graph.
Type: np.ndarray

edge_index
¶ Concatenated graph connectivity in COO format with shape [2, num_edges]. num_edges is total number of edges in the batch graph.
Type: np.ndarray, dtype int

edge_features
¶ Concatenated edge feature matrix with shape [num_edges, num_edge_features]. num_edges is total number of edges in the batch graph.
Type: np.ndarray, optional (default None)

node_pos_features
¶ Concatenated node position matrix with shape [num_nodes, num_dimensions]. num_nodes is total number of edges in the batch graph.
Type: np.ndarray, optional (default None)

num_nodes
¶ The number of nodes in the batch graph.
Type: int

num_node_features
¶ The number of features per node in the graph.
Type: int

num_edges
¶ The number of edges in the batch graph.
Type: int

num_edges_features
¶ The number of features per edge in the graph.
Type: int, optional (default None)

graph_index
¶ This vector indicates which graph the node belongs with shape [num_nodes,].
Type: np.ndarray, dtype int
Examples
>>> import numpy as np >>> from deepchem.feat.graph_data import GraphData >>> node_features_list = np.random.rand(2, 5, 10) >>> edge_index_list = np.array([ ... [[0, 1, 2, 3, 4], [1, 2, 3, 4, 0]], ... [[0, 1, 2, 3, 4], [1, 2, 3, 4, 0]], ... ], dtype=np.int) >>> graph_list = [GraphData(node_features, edge_index) for node_features, edge_index ... in zip(node_features_list, edge_index_list)] >>> batch_graph = BatchGraphData(graph_list=graph_list)

__init__
(graph_list: Sequence[deepchem.feat.graph_data.GraphData])[source]¶ Parameters: graph_list (Sequence[GraphData]) – List of GraphData
