Metrics¶
Metrics are one of the most import parts of machine learning. Unlike traditional software, in which algorithms either work or don’t work, machine learning models work in degrees. That is, there’s a continuous range of “goodness” for a model. “Metrics” are functions which measure how well a model works. There are many different choices of metrics depending on the type of model at hand.
Metric Utilities¶

deepchem.metrics.
to_one_hot
(y, n_classes=2)[source]¶ Transforms label vector into onehot encoding.
Turns y into vector of shape (n_samples, n_classes) with a onehot encoding.
Parameters: y (np.ndarray) – A vector of shape (n_samples, 1) Returns: Return type: A numpy.ndarray of shape (n_samples, n_classes).

deepchem.metrics.
from_one_hot
(y, axis=1)[source]¶ Transorms label vector from onehot encoding.
Parameters:  y (np.ndarray) – A vector of shape (n_samples, num_classes)
 axis (int, optional (default 1)) – The axis with onehot encodings to reduce on.
Returns: Return type: A numpy.ndarray of shape (n_samples,)
Metric Functions¶

deepchem.metrics.
roc_auc_score
(y, y_pred)[source]¶ Area under the receiver operating characteristic curve.

deepchem.metrics.
accuracy_score
(y, y_pred)[source]¶ Compute accuracy score
Computes accuracy score for classification tasks. Works for both binary and multiclass classification.
Parameters:  y (np.ndarray) – Of shape (N_samples,)
 y_pred (np.ndarray) – Of shape (N_samples,)
Returns: score – The fraction of correctly classified samples. A number between 0 and 1.
Return type: float

deepchem.metrics.
pearson_r2_score
(y, y_pred)[source]¶ Computes Pearson R^2 (square of Pearson correlation).

deepchem.metrics.
jaccard_index
(y, y_pred)[source]¶ Computes Jaccard Index which is the Intersection Over Union metric which is commonly used in image segmentation tasks
Parameters:  y (ground truth array) –
 y_pred (predicted array) –

deepchem.metrics.
pixel_error
(y, y_pred)[source]¶ An error metric in case y, y_pred are images.
Defined as 1  the maximal Fscore of pixel similarity, or squared Euclidean distance between the original and the result labels.
Parameters:  y (np.ndarray) – ground truth array
 y_pred (np.ndarray) – predicted array

deepchem.metrics.
kappa_score
(y_true, y_pred)[source]¶ Calculate Cohen’s kappa for classification tasks.
See https://en.wikipedia.org/wiki/Cohen%27s_kappa
Note that this implementation of Cohen’s kappa expects binary labels.
Parameters:  y_true (np.ndarray) – Numpy array containing true values.
 y_pred (np.ndarray) – Numpy array containing predicted values.
Returns: kappa – Numpy array containing kappa for each classification task.
Return type: np.ndarray
Raises:  AssertionError: If y_true and y_pred are not the same size, or if
 class labels are not in [0, 1].

deepchem.metrics.
bedroc_score
(y_true, y_pred, alpha=20.0)[source]¶ BEDROC metric implemented according to Truchon and Bayley that modifies the ROC score by allowing for a factor of early recognition
Parameters:  (array_like) (y_pred) – Binary class labels. 1 for positive class, 0 otherwise
 (array_like) – Predicted labels
 (float), default 20.0 (alpha) – Early recognition parameter
Returns: float
Return type: Value in [0, 1] that indicates the degree of early recognition
Notes
The original paper by Truchon et al. is located at https://pubs.acs.org/doi/pdf/10.1021/ci600426e

deepchem.metrics.genomic_metrics.
get_motif_scores
(encoded_sequences, motif_names, max_scores=None, return_positions=False, GC_fraction=0.4)[source]¶ Computes pwm log odds.
Parameters:  encoded_sequences (4darray) – (N_sequences, N_letters, sequence_length, 1) array
 motif_names (list of strings) –
 max_scores (int, optional) –
 return_positions (boolean, optional) –
 GC_fraction (float, optional) –
Returns:  (N_sequences, num_motifs, seq_length) complete score array by default.
 If max_scores, (N_sequences, num_motifs*max_scores) max score array.
 If max_scores and return_positions, (N_sequences, 2*num_motifs*max_scores)
 array with max scores and their positions.

deepchem.metrics.genomic_metrics.
get_pssm_scores
(encoded_sequences, pssm)[source]¶ Convolves pssm and its reverse complement with encoded sequences and returns the maximum score at each position of each sequence.
Parameters:  encoded_sequences (3darray) – (N_sequences, N_letters, sequence_length, 1) array
 pssm (2darray) – (4, pssm_length) array
Returns: scores – (N_sequences, sequence_length)
Return type: 2darray

deepchem.metrics.genomic_metrics.
in_silico_mutagenesis
(model, X)[source]¶ Computes insilicomutagenesis scores
Parameters:  model (Model) – This can be any model that accepts inputs of the required shape and produces an output of shape (N_sequences, N_tasks).
 X (ndarray) – Shape (N_sequences, N_letters, sequence_length, 1)
Returns: Return type: (num_task, N_sequences, N_letters, sequence_length, 1) ISM score array.
Metric Class¶
The dc.metrics.Metric
class is a wrapper around metric
functions which interoperates with DeepChem dc.models.Model
.

class
deepchem.metrics.
Metric
(metric, task_averager=None, name=None, threshold=None, mode=None, compute_energy_metric=False)[source]¶ Wrapper class for computing userdefined metrics.
There are a variety of different metrics this class aims to support. At the most simple, metrics for classification and regression that assume that values to compare are scalars. More complicated, there may perhaps be two image arrays that need to be compared.
The Metric class provides a wrapper for standardizing the API around different classes of metrics that may be useful for DeepChem models. The implementation provides a few nonstandard conveniences such as builtin support for multitask and multiclass metrics, and support for multidimensional outputs.

__init__
(metric, task_averager=None, name=None, threshold=None, mode=None, compute_energy_metric=False)[source]¶ Parameters:  metric (function) – function that takes args y_true, y_pred (in that order) and computes desired score.
 task_averager (function, optional) – If not None, should be a function that averages metrics across tasks. For example, task_averager=np.mean. If task_averager is provided, this task will be inherited as a multitask metric.
 name (str, optional) – Name of this metric
 threshold (float, optional) – Used for binary metrics and is the threshold for the positive class
 mode (str, optional) – Must be either classification or regression.
 compute_energy_metric (TODO(rbharath): Should this be removed?) –

compute_metric
(y_true, y_pred, w=None, n_classes=2, filter_nans=True, per_task_metrics=False)[source]¶ Compute a performance metric for each task.
Parameters:  y_true (np.ndarray) – An np.ndarray containing true values for each task.
 y_pred (np.ndarray) – An np.ndarray containing predicted values for each task.
 w (np.ndarray, optional) – An np.ndarray containing weights for each datapoint.
 n_classes (int, optional) – Number of classes in data for classification tasks.
 filter_nans (bool, optional) – Remove NaN values in computed metrics
 per_task_metrics (bool, optional) – If true, return computed metric for each task on multitask dataset.
Returns: Return type: A numpy nd.array containing metric values for each task.

compute_singletask_metric
(y_true, y_pred, w)[source]¶ Compute a metric value.
Parameters:  y_true (list) – A list of arrays containing true values for each task.
 y_pred (list) – A list of arrays containing predicted values for each task.
Returns: Return type: Float metric value.
Raises: NotImplementedError: If metric_str is not in METRICS.
