Automated Hyper-parameter Optimization in SageMaker

Oct 16, 2018

What is Hyper-Parameter Optimization (HPO)?

So you've built your model and are getting sensible results, and are now ready to squeeze out as much performance as possible. One possibility is doing Grid Search, where you try every possible combination of hyper-parameters and choose the best one. That works well if your number of choices are relatively small, but what if you have a large number of hyper-parameters, and some are continuous values that might span several orders of magnitude? Random Search works pretty well to explore the parameter space without committing to exploring all of it, but is randomly groping in the dark the best we can do?

Of course not. Bayesian Optimization is a technique for optimizing a function when making sequential decisions. In this case, we're trying to maximize performance by choosing hyper-parameter values. This sequential decision framework means that the hyper-parameters you choose for the next step will be influenced by the performance of all the previous attempts. Bayesian Optimization makes principled decisions about how to balance exploring new regions of the parameter space vs exploiting regions that are known to perform well. This is all to say that it's generally much more efficient to use Bayesian Optimization than alternatives like Grid Search and Random Search.

How to do it

The good news is that SageMaker makes this very easy because the platform takes care of the following:

Implementing the Bayesian Optimization algorithm to handle categorical, integer, and float hyper-parameters
Orchestrating the training and evaluation of models given a set of hyper-parameters from the HPO service
Integrate the training jobs and the HPO service, which communicates the selected hyper-parameter values and reports performance back once the job is complete

Prerequisites

The code below will assume we're working with a TensorFlow Estimator model, but the HPO-relevant parts should extend to any SageMaker Estimator. To run code in the way this example presents, you'll need the following:

Some understanding of how SageMaker works. If you'd like some examples of that, there are several official notebook examples in this repo. You might find the TensorFlow HPO example particularly relevant.
Have SageMaker's Python SDK
Have configured the necessary API permissions, or are running in a SageMaker Notebook Instance

Step 1 - Create an Estimator

A key requirement to run HPO with SageMaker is that your model needs to both:

Expect the hyper-parameters to be passed from SageMaker
Write performance metrics to the logs

For built-in algorithms, this has already been completed for you. In the case of using SageMaker to build arbitrary TensorFlow models, this means configuring things correctly in the model.py file, a.k.a. the “entry point”. This is the file that SageMaker uses to build your TensorFlow model, and it expects certain functions to be defined that adhere to a particular input/output scheme. (See the TensorFlow README for more details about the functions you need to specify.)

Get your model ready to accept hyper-parameters from SageMaker

To dynamically specify parameter values, your model code needs to accept, parse, and utilize them. In TensorFlow, you allow for hyper-parameters to be specified by SageMaker via the addition of the hyperparameters argument to the functions you need to specify in the entry point file. For example, for a hyper-parameter needed in your model_fn:

DEFAULT_LEARNING_RATE = 1e-3
def model_fn(features, labels, mode, hyperparameters=None):
    if hyperparameters is None:
        hyperparameters = dict()
    # Extract parameters
    learning_rate = hyperparameters.get('learning_rate', DEFAULT_LEARNING_RATE)
    ...

You might also want a hyper-parameter in the train_input_fn, e.g. to specify the number of training epochs:

def train_input_fn(training_dir, hyperparameters=None):
    # Extract parameters
    if not hyperparameters:
        hyperparameters = dict()

    epochs = hyperparameters.get('epochs', None)
    ...

These examples extract the parameter if it's specified, but use a default if not.

Write performance metrics to logs

The second requirement of writing performance metrics to the logs is an implementation detail of SageMaker: it gets the model performance of the run by extracting it from the training logs text. These are the values that are sent back to the HPO engine.

For TensorFlow, metrics that are specified in the EstimatorSpec are written to the logs by default. For example, this code exists as part of my model_fn:

def model_fn(features, labels, model, hyperparameters=None)
    ...
    if mode == tf.estimator.ModeKeys.EVAL:
        eval_metric_ops = {
            "roc_auc": tf.metrics.auc(
                labels, predictions, summation_method='careful_interpolation'),
            "pr_auc": tf.metrics.auc(
                labels, predictions, summation_method='careful_interpolation', curve='PR'),
        }
    else:
        # e.g. in "training" mode
        eval_metric_ops = {}

    return tf.estimator.EstimatorSpec(
        mode=mode,
        loss=loss,
        train_op=train_op,
        eval_metric_ops=eval_metric_ops,
    )

During training, the model will periodically stop and evaluate the test set (the details of this process can be configured by you). The logs for these events will look something like the following:

2018-10-02 17:23:40,657 INFO - tensorflow - Saving dict for global step 101: global_step = 101, loss = 0.45420808, pr_auc = 0.36799875, roc_auc = 0.6891242

This is what SageMaker will use to measure the performance of any particular training job.

Build the estimator

An Estimator is normally used to kick off a single training job. This enables you to tell SageMaker where to store the outputs, which instances to use for training…etc. Now that the functions in the entry point file have been properly configured to accept hyperparameters and write performance metrics to the logs, you can create the TensorFlow Estimator:

from sagemaker.tensorflow import TensorFlow

# The parameters that are constant and will not be tuned
shared_hyperparameters = {
    'number_layers': 5,
}

tf_estimator = TensorFlow(
    entry_point='my/tensorflow/model.py',
    role='<sagemaker_role_arn>',
    train_instance_count=1,
    train_instance_type='ml.p3.2xlarge',
    training_steps=10000,
    hyperparameters=shared_hyperparameters,
)

Step 2 - Define the performance metrics

In this step, we need to tell SageMaker how to extract the performance information from the logs. This is done by specifying a RegEx expression and assigning it to a metric name. Although you can specify multiple expressions (which are automatically gathered in AWS CloudWatch for easy plotting/monitoring), one of them needs to be singled out as the optimization objective of the HPO. You also need to specify whether you want to maximize or minimize the number. Note that while the RegEx expression will likely match multiple log entries, it's the last instance in the logs that's returned as the final performance value.

objective_metric_name = 'PR-AUC'
objective_type = 'Maximize'
metric_definitions = [
    {'Name': 'ROC-AUC', 'Regex': 'roc_auc = ([0-9\\.]+)'},
    {'Name': 'PR-AUC', 'Regex': 'pr_auc = ([0-9\\.]+)'}
]

Step 3 - Define the hyper-parameter search space

We now need to specify what our hyper-parameters are called, what type they are (continuous, integer, or categorical), and what their possible values are. Below is an example:

from sagemaker.tuner import (IntegerParameter, CategoricalParameter,
    ContinuousParameter, HyperparameterTuner)

hyperparameter_ranges = {
    "learning_rate": ContinuousParameter(1e-5, 1e-1),
    "number_nodes": IntegerParameter(32, 512),
    "optimizer": CategoricalParameter(['Adam', 'SGD'])
}

Step 4 - Specify the number of optimization iterations

Finally, we need to decide how to run the HPO job. If you run many jobs in parallel, then you can explore a large part of the space simultaneously. However, if you just ran a million jobs in parallel, you would effectively be doing Random Search. It's the sequential nature of Bayesian Optimization that allows future runs to be informed by the results of the previous runs.

We therefore need to decide how many total jobs to run, and how many to run in parallel at any given time. For instance, we might run 100 jobs, 5 in parallel. That would be 20 total sequential iterations, exploring 5 points at a time. The choices here will depend on the size of your parameter space and your budget.

Now you have everything you need to ask SageMaker to run HPO:


tuner = HyperparameterTuner(
    tf_estimator,
    objective_metric_name,
    hyperparameter_ranges,
    metric_definitions,
    max_jobs=100,
    max_parallel_jobs=5,
    objective_type=objective_type
)

# The data configuration
channel = {
    'training': 's3://<bucket_name>/my/training_file.csv',
    'test': 's3://<bucket_name>/my/test_file.csv',
}

tuner.fit(inputs=channel)

Final Performance

You can use the sdk for pinging SageMaker for status reports, or getting the stats of the best job from the HPO run. You can also do this from the AWS SageMaker console, which nicely presents a summary of all the jobs’ performance along with the HPO job configuration.

As mentioned above, you can go to CloudWatch in the console, click on Browse Metrics, and find the metrics you defined in the Name field of metric_definitions from Step 2.

And once everything is complete, you deploy the best model by simply issuing the following command:

tuner.deploy(
    initial_instance_count=1,
    instance_type='ml.p3.2xlarge'
)