creating objective function from a dataset #98

hududed · 2023-01-18T23:32:29Z

Previously with mlrMBO we were able to instantiate the model with batch-ai-data.csv in initSMBO as design:

library(mlrMBO)
...
data = subset(read.csv("batch-ai-data.csv"), select = -c(ratio))
opt.state = initSMBO(par.set = ps, design = data, control = ctrl, minimize = FALSE, noisy = TRUE)

From the mlr3mbo docs its clear how to create the objective function ObjectiveRFun$new(fun, domain, codomain) with existing function fun - is there an example of how to create the objective function without a known fun but instead with a dataset, as in initSMBO above?

The text was updated successfully, but these errors were encountered:

sumny · 2023-01-19T14:35:02Z

Hi @hududed
I assume that what you are trying to achieve is some human in the loop BO.

Maybe similarly as in this vignette in the old mlrMBO (https://mlrmbo.mlr-org.com/articles/supplementary/human_in_the_loop_MBO.html).

Maybe you can provide some additional info.
Is your function that you want to optimize a black box that simply stores data (X and f(X)) on disc?

In general, for human in the loop BO within mlr3mbo, I have two suggestions:

Simply design an Objective that allows for human in the loop evaluation:

library(mlr3mbo)
library(mlr3learners)
library(bbotk)
library(data.table)

set.seed(1)

# helper function to print xs
format_xs = function(xs) {
  paste0(map_chr(seq_along(xs), function(i) paste0(names(xs)[[i]], ": ", xs[[i]])), collapse = "; ")
}

# function that waits for evaluation and accepts user input
fun = function(xs) {
  y = readline(prompt = paste0("Evaluate: ", format_xs(xs), "\n", "y = "))
  list(y = as.numeric(y))
}

# objective
obfun = ObjectiveRFun$new(
  fun = fun,
  domain = ps(q = p_dbl(lower = -1, upper = 2), v = p_dbl(lower = -2, upper = 3)),
  codomain = ps(y = p_dbl(tags = "minimize")),
  properties = "noisy")

# instance
instance = OptimInstanceSingleCrit$new(
  objective = obfun,
  terminator = trm("evals", n_evals = 10))

# evaluate a custom design
design = data.table(q = c(0.5, 1, 2, -0.9, 1.8), v = c(-1.9, 1, 0, 2, 2.9))
instance$eval_batch(design)

# continue the optimization with mbo
opt("mbo")$optimize(instance)

when you run this code you will see that the evaluation of an xs causes to wait for user input so you can simply "evaluate" your black box function and input the objective value yourself.

If this does not work for you, you can try to operate on the primitives of mlr3mbo directly:

# example with your data
# probably not meaningful because you did not provide much info
# assume maximization of ratio depending on power, time, pressure, resistance
set.seed(1)

data = data.table(read.csv("batch-ai-data.csv"))
search_space = ps(power = p_int(lower = 0, upper = 1000),
                  time = p_int(lower = 0, upper = 5000),
                  pressure = p_int(lower = 100, upper = 500),
                  resistance = p_dbl(lower = 0, upper = 1))
codomain = ps(ratio = p_dbl(tags = "maximize"))

#data[, batch_nr := 1]  # needed because Archive methods rely on it; assume data is the initial design

# construct the archive manually
archive = Archive$new(search_space = search_space, codomain = codomain)

# initialize archive with data
# archive$data = data
archive$add_evals(xdt = data[, c("power", "time", "pressure", "resistance")], ydt = data[, "ratio"])

# then work with the primitives as you would in `?bayesopt_ego`

# create a surrogate, acquisition function and acquisition function optimizer, for defaults, see `?mbo_defaults`
surrogate = srlrn(lrn("regr.km", control = list(trace = FALSE)), archive = archive)  # GP
acq_function = acqf("ei", surrogate = surrogate) # EI
acq_optimizer = acqo(opt("random_search", batch_size = 1000),
                     terminator = trm("evals", n_evals = 1000),
                     acq_function = acq_function) # small random search


# now everything is initialized
# the following would be done repeatedly, i.e., this is now manually performing one iteration of the BO loop
acq_function$surrogate$update()
acq_function$update()
candidate = acq_optimizer$optimize()  # tells you which candidate to evaluate
candidate

# proceed to "evaluate" the candidate (or any other point you want to) and update the archive manually
data_new = data.table(power = 370, time = 2779, pressure = 178, resistance = 0.05585319, ratio = 5)

#data_new[, batch_nr := archive$n_batch + 1]  # we just evaluated a new point so we added the next batch
#archive$data = rbind(archive$data, data_new, fill = TRUE)
archive$add_evals(xdt = data_new[, c("power", "time", "pressure", "resistance")], ydt = data_new[, "ratio"])

# proceed to determine the best result manually (e.g., by surrogate prediction) ...

I hope you find this helpful.

hududed · 2023-01-19T15:10:08Z

Yes I think this is very helpful! Your second solution seems more appropriate, but let me try to expand with more info.

I plan to train the initial data set and propose next N number of candidates (14 in my case, perhaps using bayesopt_mpcl ?), run the 14 candidate experiments and update 14 new rows/points in data_new.

I will try your solution and give updates soon, thanks!

hududed · 2023-01-19T22:24:01Z

@sumny So the second solution does propose a single candidate fine, but I am not sure how to make a multi-point proposal without using something like args here:

optimizer = opt("mbo",
                loop_function = bayesopt_mpcl,
                surrogate = surrogate,
                acq_function = acq_function,
                acq_optimizer = acq_optimizer,
                args = list(q = 14, liar=min))

How do I work with the primitives of bayesopt_mpcl ?

sumny · 2023-01-22T19:38:25Z

@hududed

Here is an example on how to work with the primitives of bayesopt_mpcl on a very low level

# example with your data
# assume maximization of ratio depending on power, time, pressure, resistance
set.seed(1)

data = data.table(read.csv("batch-ai-data.csv"))
search_space = ps(power = p_int(lower = 0, upper = 1000),
                  time = p_int(lower = 0, upper = 5000),
                  pressure = p_int(lower = 100, upper = 500),
                  resistance = p_dbl(lower = 0, upper = 1))
codomain = ps(ratio = p_dbl(tags = "maximize"))

#data[, batch_nr := 1]  # needed because Archive methods rely on it; assume data is the initial design

# construct the archive manually
archive = Archive$new(search_space = search_space, codomain = codomain)

# initialize archive with data
# archive$data = data
archive$add_evals(xdt = data[, c("power", "time", "pressure", "resistance")], ydt = data[, "ratio"])

# then work with the primitives as you would in `?bayesopt_mpcl`

# create a surrogate, acquisition function and acquisition function optimizer, for defaults, see `?mbo_defaults`
surrogate = srlrn(lrn("regr.km", control = list(trace = FALSE)), archive = archive)  # GP
acq_function = acqf("ei", surrogate = surrogate) # EI
acq_optimizer = acqo(opt("random_search", batch_size = 1000),
                     terminator = trm("evals", n_evals = 1000),
                     acq_function = acq_function) # small random search

q = 14  # we want 14 proposals
lie = data.table()  # needed for constant liear
liar = mean  # liar function, e.g., constant mean

# now everything is initialized
# the following would be done repeatedly, i.e., this is now manually performing one iteration of the BO loop
# ----- begin of loop
acq_function$surrogate$update()
acq_function$update()
candidate = acq_optimizer$optimize()  # first candidate

# prepare lie objects
tmp_archive = archive$clone(deep = TRUE)
acq_function$surrogate$archive = tmp_archive
lie[, archive$cols_y := liar(archive$data[[archive$cols_y]])]
candidate_new = candidate

# obtain the other q-1 candidates using fake archive
for (i in seq_len(q)[-1L]) {
  tmp_archive$add_evals(xdt = candidate_new, xss_trafoed = transform_xdt_to_xss(candidate_new, tmp_archive$search_space), ydt = lie)
  # update all objects with lie and obtain new candidate
  acq_function$surrogate$update()
  acq_function$update()
  candidate_new = acq_optimizer$optimize()
  candidate = rbind(candidate, candidate_new)
}

acq_function$surrogate$archive = archive  # reset the working archive to the actual one and not the temporary lie archive

# proceed to "evaluate" the candidates and update the archive manually
data_new = data.table(power = c(370, 352, ...),
                      time = c(2779, 788, ...),
                      pressure = c(178, 160, ...),
                      resistance = c(0.05585319, 0.21729239, ...),
                      ratio = c(5, 9, ...)  # evaluate all 14 candidates (indicated via dots)
#data_new[, batch_nr := archive$n_batch + 1]  # we just evaluated a new batch
#archive$data = rbind(archive$data, data_new, fill = TRUE)
archive$add_evals(xdt = data_new[, c("power", "time", "pressure", "resistance")], ydt = data_new[, "ratio"])

# proceed to determine the best result manually (e.g., by surrogate prediction) ...
# ----- end of loop

This is actually plenty of code now. It might work for you, however, I will think about adding more higher level support for such human in the loop scenarios as yours (see issue #100).

On a side note, another possibility might be to write the function evaluation of your Objective based on the data on disk in a way that it waits for new lines that match the xs values that should be evaluated to be added to the data and simply evaluates the xs by reading the data in.

This way you would not have to manually work with the primitives on such a low level.

hududed · 2023-02-02T22:45:13Z

Thanks for this. For single-point proposals, the loop to update the surrogate works for some learners e.g. regr.ranger, regr.km, regr.lm, but not for others.
For example, regr.ksvm or regr.lightgbm gives me the following error:

...
surrogate = srlrn(lrn("regr.ksvm"), archive = archive) 
acq_function = acqf("ei", surrogate = surrogate)
acq_optimizer = acqo(opt("random_search", batch_size = 1000),
                     terminator = trm("evals", n_evals = 1000),
                     acq_function = acq_function)
acq_function$surrogate$update()
acq_function$update()
candidate = acq_optimizer$optimize()

WARN  [22:36:09.213] [bbotk] Assertion on 'y' failed: May not be NA.
Error: Assertion on 'y' failed: May not be NA.
Traceback:

1. acq_optimizer$optimize()
2. .__AcqOptimizer__optimize(self = self, private = private, super = super)
3. tryCatch(self$optimizer$optimize(instance), error = function(error_condition) {
 .     lg$warn(error_condition$message)
 .     stop(set_class(list(message = error_condition$message, call = NULL), 
 .         classes = c("acq_optimizer_error", "mbo_error", "error", 
 .             "condition")))
 . })
4. tryCatchList(expr, classes, parentenv, handlers)
5. tryCatchOne(expr, names, parentenv, handlers[[1L]])
6. value[[3L]](cond)

sumny · 2023-02-02T23:35:27Z

First, note that I updated the examples above to use archive$add_evals instead of simply overwriting the $data which I believe is the better way to do this (e.g., archive$add_evals(xdt = data[, c("power", "time", "pressure", "resistance")], ydt = data[, "ratio"])).

Regarding your errors.
This is a bug.
Expected improvement (AcqFunctionEI) requires a se prediction which are not implemented in the regr.ksvm or regr.lightgbm learners:

lrn("regr.lightgbm")

* Model: -
* Parameters: num_threads=1, verbose=-1, objective=regression,
  convert_categorical=TRUE
* Packages: mlr3, mlr3extralearners, lightgbm
* Predict Types:  [response]
* Feature Types: logical, integer, numeric, factor
* Properties: hotstart_forward, importance, missings, weights

(note the line * Predict Types: [response] (and the se option is missing).

We should assert this in mlr3mbo, I'll open an issue.

If you still want to use regression models without an se prediction, you can only use AcqFunctionMean as acquisition function. Not sure if this is sensible though.

hududed · 2023-02-02T23:40:34Z

Ah I missed that edit! Thanks. Ok I may just look for those with both response and se for now.

hududed changed the title ~~mlrMBO initSMBO equivalent in mlr3mbo~~ creating objective function from a dataset Jan 19, 2023

sumny closed this as completed Mar 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

creating objective function from a dataset #98

creating objective function from a dataset #98

hududed commented Jan 18, 2023 •

edited

Loading

sumny commented Jan 19, 2023 •

edited

Loading

hududed commented Jan 19, 2023

hududed commented Jan 19, 2023 •

edited

Loading

sumny commented Jan 22, 2023 •

edited

Loading

hududed commented Feb 2, 2023

sumny commented Feb 2, 2023

hududed commented Feb 2, 2023

creating objective function from a dataset #98

creating objective function from a dataset #98

Comments

hududed commented Jan 18, 2023 • edited Loading

sumny commented Jan 19, 2023 • edited Loading

hududed commented Jan 19, 2023

hududed commented Jan 19, 2023 • edited Loading

sumny commented Jan 22, 2023 • edited Loading

hududed commented Feb 2, 2023

sumny commented Feb 2, 2023

hududed commented Feb 2, 2023

hududed commented Jan 18, 2023 •

edited

Loading

sumny commented Jan 19, 2023 •

edited

Loading

hududed commented Jan 19, 2023 •

edited

Loading

sumny commented Jan 22, 2023 •

edited

Loading