Implement sklearn suggestions for model maintainability #38

kdoroschak · 2020-04-27T22:23:23Z

The random forest classification option is being loaded using instructions here: https://scikit-learn.org/stable/modules/model_persistence.html

This will permanently require a specific version (or range of versions) of sklearn, and/or additional checks like they suggest (copied here):

In order to rebuild a similar model with future versions of scikit-learn, additional metadata should be saved along the pickled model:

The training data, e.g. a reference to an immutable snapshot

The python source code used to generate the model

The versions of scikit-learn and its dependencies

The cross validation score obtained on the training data

This should make it possible to check that the cross-validation score is in the same range as before.

Code/data location:

classification/NanoporeTER_Random_Forest_classifier.ipynb.
- If run on misl-a, it should run out of the box w/ no modifications.
Training data is also here: https://github.com/uwmisl/NanoporeTER-data.

Training data is too large to include directly, but maybe we can create a small dataset for a unittest as a sentinel for "hey, something changed, check the sklearn version".

The text was updated successfully, but these errors were encountered:

kdoroschak added the enhancement New feature or request label Apr 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement sklearn suggestions for model maintainability #38

Implement sklearn suggestions for model maintainability #38

kdoroschak commented Apr 27, 2020

Implement sklearn suggestions for model maintainability #38

Implement sklearn suggestions for model maintainability #38

Comments

kdoroschak commented Apr 27, 2020