Tuning a graph learner with missing values in target variable #566

py9mrg · 2021-02-19T11:54:00Z

Hello,

Not sure if this is best here or in mlr3tuning?

So, what I want to do is tune a graph learner where my dataset contains missing values in both predictor and target variables. To do this I start by imputing the dataset, and then piping this to a learner. My problems comes when there are missing values in the target variable. The imputer does not impute these (which is good), but then the tuning throws an error when the imputed stage is piped to the learner - even if the learner should cope with NAs.

I can get around this by removing all samples where the target variable is missing first, but I want to avoid that because I get much better results if I impute with these samples, and then drop them after imputation. I can do this manually but can't seem to get it to work within a graph learner. Is there a pipe operator for dropping NAs in the target variable that I could put between the imputation and learner? I can't seem to find one.

Here's a reprex to highlight the point:

library(mlr3verse)

# data with no missing values
data <- tibble::tibble(variable1 = 1:100, variable2 = 1:100, target = variable1^2 + variable2^2)

# data with missing target values
# first two samples are missing predictor values, 3rd sample is missing the target.
# if I exclude the 3rd sample everything is fine,
# but I want to include this sample in the imputation stage because
# the information this sample provides leads to better imputation
# hence better overall results so I only want to drop it after the imputation
data_w_missing <- data
data_w_missing[1, 1] <- NA_integer_
data_w_missing[2, 2] <- NA_integer_
data_w_missing[3, 3] <- NA_integer_

task <- TaskRegr$new(id = "test1", backend = data, target = "target")
task_w_missing <- TaskRegr$new(id = "test2", backend = data_w_missing, target = "target")

# even if I explicitly set the svm argument , na.action = "na.omit"
# I still get an error during tuning
# it's the default option anyway, as is the type, but this needs to be
# set explicitly because it's the parent of cost being tuned
graph <- po("imputehist") %>>%
  po(lrn("regr.svm", type = "eps-regression")) 
graph$plot()

graph_learner <- GraphLearner$new(graph)

search_space = ps(
  regr.svm.cost = p_dbl(lower = 0.1, upper = 1)
)

tuner <- tnr("grid_search", resolution = 3)

at = AutoTuner$new(
  learner = graph_learner,
  resampling = rsmp("cv", folds = 3),
  measure = msr("regr.rmse"),
  search_space = search_space,
  terminator = trm("none"),
  tuner = tuner
)

# tuning on the complete set works fine 
at$train(task)

# tuning on the data with the target missing throws an error
# note, if the NAs are only in the predictors then no error
# the error only comes when the target contains missing
at$train(task_w_missing)

# error message:

# Error in assert_regr(truth, response = response) : 
#   Assertion on 'truth' failed: Contains missing values (element 3).
# In addition: Warning messages:
# 1: In yorig - ret$fitted :
#   longer object length is not a multiple of shorter object length
# 2: In yorig - ret$fitted :
#   longer object length is not a multiple of shorter object length

# EDIT: just realised the issue is caused by svm returning a smaller set of predictions than in the data

The text was updated successfully, but these errors were encountered:

py9mrg · 2021-02-19T16:11:07Z

Just realised this issue is already covered by #410, I think, so closing here.

py9mrg closed this as completed Feb 19, 2021

py9mrg mentioned this issue Feb 22, 2021

PipeOpFilterRows #410

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tuning a graph learner with missing values in target variable #566

Tuning a graph learner with missing values in target variable #566

py9mrg commented Feb 19, 2021 •

edited

Loading

py9mrg commented Feb 19, 2021 •

edited

Loading

Tuning a graph learner with missing values in target variable #566

Tuning a graph learner with missing values in target variable #566

Comments

py9mrg commented Feb 19, 2021 • edited Loading

py9mrg commented Feb 19, 2021 • edited Loading

py9mrg commented Feb 19, 2021 •

edited

Loading

py9mrg commented Feb 19, 2021 •

edited

Loading