PipeOp to try repair predicting with unseen factor levels #71
Labels
Priority: Medium
Status: Contrib (unprepared)
In someone's opinion, this is an issue that could be handled by a contributor with the right support
Status: Needs Design
Needs some thought and design decisions.
Type: New PipeOp
Issue suggests a new PipeOp
Milestone
problem: quite often, a learner breaks, because it sees SOME prediction in a larger table, which contains new, unseen factor levels. in such a case the predict of the underlying learner fails, completely.
see reprex here:
mlr-org/mlr3#97
this is really annoying. especially as this can happen on only a few observations, but we still 100% fail the complete prediction.
current options are: the mlr3 fallback learner. that does not really help. because this produces now fallback predictions on the complete test set.
here is MAYBE a better option.
PipOpUnseenLevels
before we go into the learner, we can on-training, store which levels are present in each factor.
PipOpUnseenLevels
train: task--stored-levels--->task
predict: task-->stored-levels-->task
train: simply stores a list, one element per factor feature, with the seen level
predict: does through all observations. for each observation where we see "unseen" levels, we create a random row, by sampling from the marginals of the columns.
that is a bit hacky, but should work?
The text was updated successfully, but these errors were encountered: