Bayesian optimization tutorial

Basic Bayesian Optimization¶

In this tutorial we demonstrate the use of Xopt to preform Bayesian Optimization on a simple test problem.

Define the test problem¶

Here we define a simple optimization problem, where we attempt to minimize the sin function in the domian [0,2*pi]. Note that the function used to evaluate the objective function takes a dictionary as input and returns a dictionary as the output.

In [1]:

Copied!





from xopt.vocs import VOCS
from xopt.evaluator import Evaluator
from xopt.generators.bayesian import UpperConfidenceBoundGenerator
from xopt import Xopt
import torch
import matplotlib.pyplot as plt
import math
import numpy as np

# define variables and function objectives
vocs = VOCS(
    variables={"x": [0, 2 * math.pi]},
    objectives={"f": "MINIMIZE"},
)
from xopt.vocs import VOCS
from xopt.evaluator import Evaluator
from xopt.generators.bayesian import UpperConfidenceBoundGenerator
from xopt import Xopt
import torch
import matplotlib.pyplot as plt
import math
import numpy as np

# define variables and function objectives
vocs = VOCS(
    variables={"x": [0, 2 * math.pi]},
    objectives={"f": "MINIMIZE"},
)

In [2]:

Copied!

# define a test function to optimize
def sin_function(input_dict):
    return {"f": np.sin(input_dict["x"])}
# define a test function to optimize
def sin_function(input_dict):
    return {"f": np.sin(input_dict["x"])}

Create Xopt objects¶

Create the evaluator to evaluate our test function and create a generator that uses the Upper Confidence Bound acquisition function to perform Bayesian Optimization. Note that because we are optimizing a problem with no noise we set use_low_noise_prior=True in the GP model constructor.

In [3]:

Copied!





evaluator = Evaluator(function=sin_function)
generator = UpperConfidenceBoundGenerator(vocs=vocs)
generator.gp_constructor.use_low_noise_prior = True
X = Xopt(evaluator=evaluator, generator=generator, vocs=vocs)
evaluator = Evaluator(function=sin_function)
generator = UpperConfidenceBoundGenerator(vocs=vocs)
generator.gp_constructor.use_low_noise_prior = True
X = Xopt(evaluator=evaluator, generator=generator, vocs=vocs)

Generate and evaluate initial points¶

To begin optimization, we must generate some random initial data points. The first call to X.step() will generate and evaluate a number of randomly points specified by the generator. Note that if we add data to xopt before calling X.step() by assigning the data to X.data, calls to X.step() will ignore the random generation and proceed to generating points via Bayesian optimization.

In [4]:

Copied!

# call X.random_evaluate() to generate + evaluate 3 initial points
X.random_evaluate(2)

# inspect the gathered data
X.data
# call X.random_evaluate() to generate + evaluate 3 initial points
X.random_evaluate(2)

# inspect the gathered data
X.data

Out[4]:

	x	f	xopt_runtime	xopt_error
0	0.038763	0.038753	0.000009	False
1	2.945691	0.194651	0.000003	False

Do bayesian optimization steps¶

To perform optimization we simply call X.step() in a loop. This allows us to do intermediate tasks in between optimization steps, such as examining the model and acquisition function at each step (as we demonstrate here).

In [5]:

Copied!





n_steps = 5

# test points for plotting
test_x = torch.linspace(*X.vocs.bounds.flatten(), 50).double()

for i in range(n_steps):
    # get the Gaussian process model from the generator
    model = X.generator.train_model()

    # get acquisition function from generator
    acq = X.generator.get_acquisition(model)

    # calculate model posterior and acquisition function at each test point
    # NOTE: need to add a dimension to the input tensor for evaluating the
    # posterior and another for the acquisition function, see
    # https://botorch.org/docs/batching for details
    # NOTE: we use the `torch.no_grad()` environment to speed up computation by
    # skipping calculations for backpropagation
    with torch.no_grad():
        posterior = model.posterior(test_x.unsqueeze(1))
        acq_val = acq(test_x.reshape(-1, 1, 1))

    # get mean function and confidence regions
    mean = posterior.mean
    L, u = posterior.mvn.confidence_region()

    # plot model and acquisition function
    fig, ax = plt.subplots(2, 1, sharex="all")

    # plot model posterior
    ax[0].plot(test_x, mean, label="Posterior mean")
    ax[0].fill_between(test_x, L, u, alpha=0.25, label="Posterior confidence region")

    # add data to model plot
    ax[0].plot(X.data["x"], X.data["f"], "C1o", label="Training data")

    # plot true function
    true_f = sin_function({"x": test_x})["f"]
    ax[0].plot(test_x, true_f, "--", label="Ground truth")

    # add legend
    ax[0].legend()

    # plot acquisition function
    ax[1].plot(test_x, acq_val.flatten())

    ax[0].set_ylabel("f")
    ax[1].set_ylabel(r"$\alpha(x)$")
    ax[1].set_xlabel("x")

    # do the optimization step
    X.step()
n_steps = 5

# test points for plotting
test_x = torch.linspace(*X.vocs.bounds.flatten(), 50).double()

for i in range(n_steps):
    # get the Gaussian process model from the generator
    model = X.generator.train_model()

    # get acquisition function from generator
    acq = X.generator.get_acquisition(model)

    # calculate model posterior and acquisition function at each test point
    # NOTE: need to add a dimension to the input tensor for evaluating the
    # posterior and another for the acquisition function, see
    # https://botorch.org/docs/batching for details
    # NOTE: we use the `torch.no_grad()` environment to speed up computation by
    # skipping calculations for backpropagation
    with torch.no_grad():
        posterior = model.posterior(test_x.unsqueeze(1))
        acq_val = acq(test_x.reshape(-1, 1, 1))

    # get mean function and confidence regions
    mean = posterior.mean
    L, u = posterior.mvn.confidence_region()

    # plot model and acquisition function
    fig, ax = plt.subplots(2, 1, sharex="all")

    # plot model posterior
    ax[0].plot(test_x, mean, label="Posterior mean")
    ax[0].fill_between(test_x, L, u, alpha=0.25, label="Posterior confidence region")

    # add data to model plot
    ax[0].plot(X.data["x"], X.data["f"], "C1o", label="Training data")

    # plot true function
    true_f = sin_function({"x": test_x})["f"]
    ax[0].plot(test_x, true_f, "--", label="Ground truth")

    # add legend
    ax[0].legend()

    # plot acquisition function
    ax[1].plot(test_x, acq_val.flatten())

    ax[0].set_ylabel("f")
    ax[1].set_ylabel(r"$\alpha(x)$")
    ax[1].set_xlabel("x")

    # do the optimization step
    X.step()

/tmp/ipykernel_7761/2254313937.py:3: DeprecationWarning: __array_wrap__ must accept context and return_scalar arguments (positionally) in the future. (Deprecated NumPy 2.0)
  return {"f": np.sin(input_dict["x"])}
/tmp/ipykernel_7761/2254313937.py:3: DeprecationWarning: __array_wrap__ must accept context and return_scalar arguments (positionally) in the future. (Deprecated NumPy 2.0)
  return {"f": np.sin(input_dict["x"])}

/tmp/ipykernel_7761/2254313937.py:3: DeprecationWarning: __array_wrap__ must accept context and return_scalar arguments (positionally) in the future. (Deprecated NumPy 2.0)
  return {"f": np.sin(input_dict["x"])}

/tmp/ipykernel_7761/2254313937.py:3: DeprecationWarning: __array_wrap__ must accept context and return_scalar arguments (positionally) in the future. (Deprecated NumPy 2.0)
  return {"f": np.sin(input_dict["x"])}
/tmp/ipykernel_7761/2254313937.py:3: DeprecationWarning: __array_wrap__ must accept context and return_scalar arguments (positionally) in the future. (Deprecated NumPy 2.0)
  return {"f": np.sin(input_dict["x"])}

No description has been provided for this image

In [6]:

Copied!

# access the collected data
X.data
# access the collected data
X.data

Out[6]:

	x	f	xopt_runtime	xopt_error
0	0.038763	3.875297e-02	0.000009	False
1	2.945691	1.946515e-01	0.000003	False
2	6.283185	-2.449294e-16	0.000007	False
3	5.278993	-8.437287e-01	0.000006	False
4	4.892220	-9.838739e-01	0.000005	False
5	4.648386	-9.979525e-01	0.000005	False
6	4.735394	-9.997354e-01	0.000006	False

Getting the optimization result¶

To get the best point (without evaluating it) we ask the generator to predict the optimum based on the posterior mean.

In [7]:

Copied!

X.generator.get_optimum()
X.generator.get_optimum()

Out[7]:

	x
0	4.740502

Customizing optimization¶

Each generator has a set of options that can be modified to effect optimization behavior

In [8]:

Copied!

X.generator.dict()
X.generator.dict()

/tmp/ipykernel_7761/1542263183.py:1: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
  X.generator.dict()

Out[8]:

{'supports_single_objective': True,
 'supports_constraints': True,
 'model': ModelListGP(
   (models): ModuleList(
     (0): SingleTaskGP(
       (likelihood): GaussianLikelihood(
         (noise_covar): HomoskedasticNoise(
           (noise_prior): GammaPrior()
           (raw_noise_constraint): GreaterThan(1.000E-04)
         )
       )
       (mean_module): ConstantMean()
       (covar_module): RBFKernel(
         (lengthscale_prior): LogNormalPrior()
         (raw_lengthscale_constraint): GreaterThan(2.500E-02)
       )
       (outcome_transform): Standardize()
       (input_transform): Normalize()
     )
   )
   (likelihood): LikelihoodList(
     (likelihoods): ModuleList(
       (0): GaussianLikelihood(
         (noise_covar): HomoskedasticNoise(
           (noise_prior): GammaPrior()
           (raw_noise_constraint): GreaterThan(1.000E-04)
         )
       )
     )
   )
 ),
 'n_monte_carlo_samples': 128,
 'turbo_controller': None,
 'use_cuda': False,
 'gp_constructor': {'name': 'standard',
  'use_low_noise_prior': True,
  'covar_modules': {},
  'mean_modules': {},
  'trainable_mean_keys': [],
  'transform_inputs': True,
  'custom_noise_prior': None,
  'use_cached_hyperparameters': False},
 'numerical_optimizer': {'name': 'LBFGS',
  'n_restarts': 20,
  'max_iter': 2000,
  'max_time': 5.0},
 'max_travel_distances': None,
 'fixed_features': None,
 'computation_time':    training  acquisition_optimization
 0  0.080963                  0.034042
 1  0.065613                  0.024260
 2  0.077541                  0.059370
 3  0.058682                  0.026260
 4  0.045589                  0.029192,
 'custom_objective': None,
 'n_interpolate_points': None,
 'n_candidates': 1,
 'beta': 2.0}