Metalearning¶
One of the hardest challenges in scientific machine learning is lack of access of sufficient data. Sometimes experiments are slow and expensive and there’s no easy way to gain access to more data. What do you do then?
This module contains a collection of techniques for doing low data learning. “Metalearning” traditionally refers to techniques for “learning to learn” but here we take it to mean any technique which proves effective for learning with low amounts of data.
MetaLearner¶
This is the abstract superclass for metalearning algorithms.
- class MetaLearner[source]¶
Model and data to which the MAML algorithm can be applied.
To use MAML, create a subclass of this defining the learning problem to solve. It consists of a model that can be trained to perform many different tasks, and data for training it on a large (possibly infinite) set of different tasks.
- compute_model(inputs, variables, training)[source]¶
Compute the model for a set of inputs and variables.
- Parameters:
inputs (list of tensors) – the inputs to the model
variables (list of tensors) – the values to use for the model’s variables. This might be the actual variables (as returned by the MetaLearner’s variables property), or alternatively it might be the values of those variables after one or more steps of gradient descent for the current task.
training (bool) – indicates whether the model is being invoked for training or prediction
- Returns:
(loss, outputs) where loss is the value of the model’s loss function, and
outputs is a list of the model’s outputs
- select_task()[source]¶
Select a new task to train on.
If there is a fixed set of training tasks, this will typically cycle through them. If there are infinitely many training tasks, this can simply select a new one each time it is called.
Tensorflow implementation¶
MAML¶
- class MAML(learner, learning_rate=0.001, optimization_steps=1, meta_batch_size=10, optimizer=<deepchem.models.optimizers.Adam object>, model_dir=None)[source]¶
Implements the Model-Agnostic Meta-Learning algorithm for low data learning.
The algorithm is described in Finn et al., “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks” (https://arxiv.org/abs/1703.03400). It is used for training models that can perform a variety of tasks, depending on what data they are trained on. It assumes you have training data for many tasks, but only a small amount for each one. It performs “meta-learning” by looping over tasks and trying to minimize the loss on each one after one or a few steps of gradient descent. That is, it does not try to create a model that can directly solve the tasks, but rather tries to create a model that is very easy to train.
To use this class, create a subclass of MetaLearner that encapsulates the model and data for your learning problem. Pass it to a MAML object and call fit(). You can then use train_on_current_task() to fine tune the model for a particular task.
- __init__(learner, learning_rate=0.001, optimization_steps=1, meta_batch_size=10, optimizer=<deepchem.models.optimizers.Adam object>, model_dir=None)[source]¶
Create an object for performing meta-optimization.
- Parameters:
learner (MetaLearner) – defines the meta-learning problem
learning_rate (float or Tensor) – the learning rate to use for optimizing each task (not to be confused with the one used for meta-learning). This can optionally be made a variable (represented as a Tensor), in which case the learning rate will itself be learnable.
optimization_steps (int) – the number of steps of gradient descent to perform for each task
meta_batch_size (int) – the number of tasks to use for each step of meta-learning
optimizer (Optimizer) – the optimizer to use for meta-learning (not to be confused with the gradient descent optimization performed for each task)
model_dir (str) – the directory in which the model will be saved. If None, a temporary directory will be created.
- fit(steps, max_checkpoints_to_keep=5, checkpoint_interval=600, restore=False)[source]¶
Perform meta-learning to train the model.
- Parameters:
steps (int) – the number of steps of meta-learning to perform
max_checkpoints_to_keep (int) – the maximum number of checkpoint files to keep. When this number is reached, older files are deleted.
checkpoint_interval (float) – the time interval at which to save checkpoints, measured in seconds
restore (bool) – if True, restore the model from the most recent checkpoint before training it further
- train_on_current_task(optimization_steps=1, restore=True)[source]¶
Perform a few steps of gradient descent to fine tune the model on the current task.
- Parameters:
optimization_steps (int) – the number of steps of gradient descent to perform
restore (bool) – if True, restore the model from the most recent checkpoint before optimizing
Torch implementation¶
MAML¶
- class MAML(learner: ~deepchem.metalearning.MetaLearner, learning_rate: float | ~deepchem.models.optimizers.LearningRateSchedule = 0.001, optimization_steps: int = 1, meta_batch_size: int = 10, optimizer: ~deepchem.models.optimizers.Optimizer = <deepchem.models.optimizers.Adam object>, model_dir: str | None = None, device: ~torch.device | None = None)[source]¶
Implements the Model-Agnostic Meta-Learning algorithm for low data learning.
The algorithm is described in Finn et al., “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks” (https://arxiv.org/abs/1703.03400). It is used for training models that can perform a variety of tasks, depending on what data they are trained on. It assumes you have training data for many tasks, but only a small amount for each one. It performs “meta-learning” by looping over tasks and trying to minimize the loss on each one after one or a few steps of gradient descent. That is, it does not try to create a model that can directly solve the tasks, but rather tries to create a model that is very easy to train.
To use this class, create a subclass of MetaLearner that encapsulates the model and data for your learning problem. Pass it to a MAML object and call fit(). You can then use train_on_current_task() to fine tune the model for a particular task. .. rubric:: Example
>>> import deepchem as dc >>> import numpy as np >>> import torch >>> import torch.nn.functional as F >>> from deepchem.metalearning.torch_maml import MetaLearner, MAML >>> class SineLearner(MetaLearner): ... def __init__(self): ... self.batch_size = 10 ... self.w1 = torch.nn.Parameter(torch.tensor(np.random.normal(size=[1, 40], scale=1.0),requires_grad=True)) ... self.w2 = torch.nn.Parameter(torch.tensor(np.random.normal(size=[40, 40], scale=np.sqrt(1 / 40)),requires_grad=True)) ... self.w3 = torch.nn.Parameter(torch.tensor(np.random.normal(size=[40, 1], scale=np.sqrt(1 / 40)),requires_grad=True)) ... self.b1 = torch.nn.Parameter(torch.tensor(np.zeros(40)),requires_grad=True) ... self.b2 = torch.nn.Parameter(torch.tensor(np.zeros(40)),requires_grad=True) ... self.b3 = torch.nn.Parameter(torch.tensor(np.zeros(1)),requires_grad=True) ... def compute_model(self, inputs, variables, training): ... x, y = inputs ... w1, w2, w3, b1, b2, b3 = variables ... dense1 = F.relu(torch.matmul(x, w1) + b1) ... dense2 = F.relu(torch.matmul(dense1, w2) + b2) ... output = torch.matmul(dense2, w3) + b3 ... loss = torch.mean(torch.square(output - y)) ... return loss, [output] ... @property ... def variables(self): ... return [self.w1, self.w2, self.w3, self.b1, self.b2, self.b3] ... def select_task(self): ... self.amplitude = 5.0 * np.random.random() ... self.phase = np.pi * np.random.random() ... def get_batch(self): ... x = torch.tensor(np.random.uniform(-5.0, 5.0, (self.batch_size, 1))) ... return [x, torch.tensor(self.amplitude * np.sin(x + self.phase))] ... def parameters(self): ... for key, value in self.__dict__.items(): ... if isinstance(value, torch.nn.Parameter): ... yield value >>> learner = SineLearner() >>> optimizer = dc.models.optimizers.Adam(learning_rate=5e-3) >>> maml = MAML(learner,meta_batch_size=4,optimizer=optimizer) >>> maml.fit(9000)
To test it out on a new task and see how it works
>>> learner.select_task() >>> maml.restore() >>> batch = learner.get_batch() >>> loss, outputs = maml.predict_on_batch(batch) >>> maml.train_on_current_task() >>> loss, outputs = maml.predict_on_batch(batch)
- __init__(learner: ~deepchem.metalearning.MetaLearner, learning_rate: float | ~deepchem.models.optimizers.LearningRateSchedule = 0.001, optimization_steps: int = 1, meta_batch_size: int = 10, optimizer: ~deepchem.models.optimizers.Optimizer = <deepchem.models.optimizers.Adam object>, model_dir: str | None = None, device: ~torch.device | None = None)[source]¶
Create an object for performing meta-optimization.
- Parameters:
learner (MetaLearner) – defines the meta-learning problem
learning_rate (float or Tensor) – the learning rate to use for optimizing each task (not to be confused with the one used for meta-learning). This can optionally be made a variable (represented as a Tensor), in which case the learning rate will itself be learnable.
optimization_steps (int) – the number of steps of gradient descent to perform for each task
meta_batch_size (int) – the number of tasks to use for each step of meta-learning
optimizer (Optimizer) – the optimizer to use for meta-learning (not to be confused with the gradient descent optimization performed for each task)
model_dir (str) – the directory in which the model will be saved. If None, a temporary directory will be created.
device (torch.device, optional (default None)) – the device on which to run computations. If None, a device is chosen automatically.
- fit(steps: int, max_checkpoints_to_keep: int = 5, checkpoint_interval: int = 600, restore: bool = False)[source]¶
Perform meta-learning to train the model.
- Parameters:
steps (int) – the number of steps of meta-learning to perform
max_checkpoints_to_keep (int) – the maximum number of checkpoint files to keep. When this number is reached, older files are deleted.
checkpoint_interval (int) – the time interval at which to save checkpoints, measured in seconds
restore (bool) – if True, restore the model from the most recent checkpoint before training it further
- train_on_current_task(optimization_steps: int = 1, restore: bool = True)[source]¶
Perform a few steps of gradient descent to fine tune the model on the current task.
- Parameters:
optimization_steps (int) – the number of steps of gradient descent to perform
restore (bool) – if True, restore the model from the most recent checkpoint before optimizing
- predict_on_batch(inputs: Tensor | Sequence[Tensor]) Tuple[Tensor, Sequence[Tensor]] [source]¶
Compute the model’s outputs for a batch of inputs.
- Parameters:
inputs (list of arrays) – the inputs to the model
- Returns:
(loss, outputs) where loss is the value of the model’s loss function, and
outputs is a list of the model’s outputs
- save_checkpoint(max_checkpoints_to_keep: int = 5, model_dir: str | None = None) None [source]¶
Save a checkpoint to disk.
Usually you do not need to call this method, since fit() saves checkpoints automatically. If you have disabled automatic checkpointing during fitting, this can be called to manually write checkpoints.
- Parameters:
max_checkpoints_to_keep (int) – the maximum number of checkpoints to keep. Older checkpoints are discarded.
model_dir (str, default None) – Model directory to save checkpoint to. If None, revert to self.model_dir