You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So, what I want to do is tune a graph learner where my dataset contains missing values in both predictor and target variables. To do this I start by imputing the dataset, and then piping this to a learner. My problems comes when there are missing values in the target variable. The imputer does not impute these (which is good), but then the tuning throws an error when the imputed stage is piped to the learner - even if the learner should cope with NAs.
I can get around this by removing all samples where the target variable is missing first, but I want to avoid that because I get much better results if I impute with these samples, and then drop them after imputation. I can do this manually but can't seem to get it to work within a graph learner. Is there a pipe operator for dropping NAs in the target variable that I could put between the imputation and learner? I can't seem to find one.
Here's a reprex to highlight the point:
library(mlr3verse)
# data with no missing values
data <- tibble::tibble(variable1 = 1:100, variable2 = 1:100, target = variable1^2 + variable2^2)
# data with missing target values
# first two samples are missing predictor values, 3rd sample is missing the target.
# if I exclude the 3rd sample everything is fine,
# but I want to include this sample in the imputation stage because
# the information this sample provides leads to better imputation
# hence better overall results so I only want to drop it after the imputation
data_w_missing <- data
data_w_missing[1, 1] <- NA_integer_
data_w_missing[2, 2] <- NA_integer_
data_w_missing[3, 3] <- NA_integer_
task <- TaskRegr$new(id = "test1", backend = data, target = "target")
task_w_missing <- TaskRegr$new(id = "test2", backend = data_w_missing, target = "target")
# even if I explicitly set the svm argument , na.action = "na.omit"
# I still get an error during tuning
# it's the default option anyway, as is the type, but this needs to be
# set explicitly because it's the parent of cost being tuned
graph <- po("imputehist") %>>%
po(lrn("regr.svm", type = "eps-regression"))
graph$plot()
graph_learner <- GraphLearner$new(graph)
search_space = ps(
regr.svm.cost = p_dbl(lower = 0.1, upper = 1)
)
tuner <- tnr("grid_search", resolution = 3)
at = AutoTuner$new(
learner = graph_learner,
resampling = rsmp("cv", folds = 3),
measure = msr("regr.rmse"),
search_space = search_space,
terminator = trm("none"),
tuner = tuner
)
# tuning on the complete set works fine
at$train(task)
# tuning on the data with the target missing throws an error
# note, if the NAs are only in the predictors then no error
# the error only comes when the target contains missing
at$train(task_w_missing)
# error message:
# Error in assert_regr(truth, response = response) :
# Assertion on 'truth' failed: Contains missing values (element 3).
# In addition: Warning messages:
# 1: In yorig - ret$fitted :
# longer object length is not a multiple of shorter object length
# 2: In yorig - ret$fitted :
# longer object length is not a multiple of shorter object length
# EDIT: just realised the issue is caused by svm returning a smaller set of predictions than in the data
The text was updated successfully, but these errors were encountered:
Hello,
Not sure if this is best here or in mlr3tuning?
So, what I want to do is tune a graph learner where my dataset contains missing values in both predictor and target variables. To do this I start by imputing the dataset, and then piping this to a learner. My problems comes when there are missing values in the target variable. The imputer does not impute these (which is good), but then the tuning throws an error when the imputed stage is piped to the learner - even if the learner should cope with NAs.
I can get around this by removing all samples where the target variable is missing first, but I want to avoid that because I get much better results if I impute with these samples, and then drop them after imputation. I can do this manually but can't seem to get it to work within a graph learner. Is there a pipe operator for dropping NAs in the target variable that I could put between the imputation and learner? I can't seem to find one.
Here's a reprex to highlight the point:
The text was updated successfully, but these errors were encountered: