Using Early Stopping
This guide shows how you can use early stopping to improve your Katib experiments. Early stopping allows you to avoid overfitting when you train your model during Katib experiments. It also helps by saving computing resources and reducing experiment execution time by stopping the experiment’s trials when the target metric(s) no longer improves before the training process is complete.
The major advantage of using early stopping in Katib is that you don’t need to modify your training container package. All you have to do is make necessary changes in your experiment’s YAML file.
Early stopping works in the same way as Katib’s
metrics collector.
It analyses required metrics from the stdout
or from the arbitrary output file
and an early stopping algorithm makes the decision if the trial needs to be
stopped. Currently, early stopping works only with
StdOut
or File
metrics collectors.
Note: Your training container must print training logs with the timestamp,
because early stopping algorithms need to know the sequence of reported metrics.
Check the
PyTorch
example
to learn how to add a date format to your logs.
Configure the experiment with early stopping
As a reference, you can use the YAML file of the early stopping example.
Follow the guide to configure your Katib experiment.
Next, to apply early stopping for your experiment, specify the
.spec.earlyStopping
parameter, similar to the.spec.algorithm
. Refer to theEarlyStoppingSpec
type for more information..earlyStopping.algorithmName
- the name of the early stopping algorithm..earlyStopping.algorithmSettings
- the settings for the early stopping algorithm.
What happens is your experiment’s suggestion produces new trials. After that,
the early stopping algorithm generates early stopping rules for the created
trials. Once the trial reaches all the rules, it is stopped and the trial status
is changed to the EarlyStopped
. Then, Katib calls the suggestion again to
ask for the new trials.
Learn more about Katib concepts in the overview guide.
Follow the Katib configuration guide to specify your own image for the early stopping algorithm.
Early stopping algorithms in detail
Here’s a list of the early stopping algorithms available in Katib:
More algorithms are under development.
You can add an early stopping algorithm to Katib yourself. Check the developer guide to contribute.
Median Stopping Rule
The early stopping algorithm name in Katib is medianstop
.
The median stopping rule stops a pending trial X
at step S
if the trial’s
best objective value by step S
is worse than the median value of the running
averages of all completed trials’ objectives reported up to step S
.
To learn more about it, check Google Vizier: A Service for Black-Box Optimization.
Katib supports the following early stopping settings:
Setting Name | Description | Default Value |
---|---|---|
min_trials_required | Minimal number of successful trials to compute median value | 3 |
start_step | Number of reported intermediate results before stopping the trial | 4 |
Submit an early stopping experiment from the UI
You can use Katib UI to submit an early stopping experiment. Follow these steps to create an experiment from the UI.
Once you reach the early stopping section, select the appropriate values:
View the early stopping experiment results
First, make sure you have jq installed.
Check the early stopped trials in your experiment:
kubectl get experiment <experiment-name> -n <experiment-namespace> -o json | jq -r ".status"
The last part of the above command output looks similar to this:
. . .
"earlyStoppedTrialList": [
"median-stop-2ml8h96d",
"median-stop-cgjkq8zn",
"median-stop-pvn5p54p",
"median-stop-sjc9tcgc"
],
"startTime": "2020-11-05T03:03:43Z",
"succeededTrialList": [
"median-stop-2kmh57qf",
"median-stop-7ccstz4z",
"median-stop-7sqt7556",
"median-stop-lgvhfch2",
"median-stop-mkfjtwbj",
"median-stop-nfmgqd7w",
"median-stop-nsbxw5m9",
"median-stop-nsmhg4p2",
"median-stop-rp88xflk",
"median-stop-xl7dlf5n",
"median-stop-ztc58kwq"
],
"trials": 15,
"trialsEarlyStopped": 4,
"trialsSucceeded": 11
}
Check the status of the early stopped trial by running this command:
kubectl get trial median-stop-2ml8h96d -n <experiment-namespace>
and you should be able to view EarlyStopped
status for the trial:
NAME TYPE STATUS AGE
median-stop-2ml8h96d EarlyStopped True 15m
In addition, you can check your results on the Katib UI. The trial statuses on the experiment monitor page should look as follows:
You can click on the early stopped trial name to get reported metrics before this trial is early stopped:
Next steps
Learn how to configure and run your Katib experiments.
Check the Katib Configuration (Katib config).
How to set up environment variables for each Katib component.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.